Monday, December 8, 2008

Problem of Gradient Well-formedness

http://en.wikipedia.org/wiki/Gradient_well-formedness

This problem, I think, is important for our research. It's one of the unsolved problems in linguistics. I found surprising similarity between the internal structures of this problem and our quest.

The problem is basically as follows: if an expression's degree of well-formedness varies, then how can we categorize it as either well-formed or ill-formed. Now replace "well-formed" with "formal" and "ill-formed" with "colloquial". It becomes very similar to our problem. Even more important is the fact that till now, I've found no previous work discussing the issue of colloquialism detection whatsoever, but this problem seems to have drawn attention of linguists for a number of years. This link mentions 2 papers, each proposing a new way of solving (or attempting to solve) the problem. Both hinge upon the methods and ideals of Optimality Theory (http://en.wikipedia.org/wiki/Optimality_theory) in some way or other.

The first one (http://www.sfb441.uni-tuebingen.de/~sam/papers/DGfS04.handout.pdf) discusses Decathlon Model, which has two modules - constraint application (blind, cumulative) and output selection (competitive, probabilistic).

The second one (http://www.linguistics.ucla.edu/people/hayes/gradient.pdf) actually modifies the Optimality Theory in a very subtle fashion to generate a working model for dealing with the problem. It deals with the ideas of RANKING, CONSTRAINTS and STRICTNESS in linguistics.

Optimality Theory has 3 sets - generative set GEN, constraint set CON and evaluation rules set EVAL. This second approach modifies CON and EVAL in subtle ways to model the gradience. However, the problem in both these papers is that they have tested their models on phonetic data, not on texts. Also, optimality theory was mainly developed in phonetics. But the well-formedness problem is much more general, I think. So the solutions should work anyway.

We'll have to test the things from our viewpoint.

No comments: