Thursday, November 13, 2008

Difference!

Well, we already noted the difference between document classification and word classification. For document classification, words (and also expressions) are features. For word classification, several probability metrics are defined which act as features. These probability metrics hinge upon the existing dependency between successive words and expressions.

Now we go on refining and farthering our ideas to the more intricate realm of expression classification, something closely resembling our aim. We are classifying expressions binarily - colloquial or non-colloquial. It has a distinct flavor. While in word classification we go on looking at the dependency models (say, e.g., in part-of-speech (POS) tagging, we consider graphical models like HMM, MEMM and CRF), here we can't apply the same idea. There are two principal reasons. First, the dependencies here are not as strong as in POS tagging. Sometimes, there are no dependencies at all. Second, we really have not been able to find out more than two features (presence in a lexicon and non-conformance to certain grammar rules), which makes the problem unfit for a classification mechanism. The "novelty" of an expression could have been one of the features, but there also we see that many (if not most) of the novelties result in non-colloquial forms and terminologies rather than colloquialisms.

If classification were to be used, we could have employed algorithms like Naive Bayes or Maximum Entropy. But it gradually becomes apparent that we should rather go for a rule-based (or grammar-based) approach.

Four key issues revisited and correlated

1> Identifying colloquialisms
2> Identifying the contexts
3> Translating to non-colloquial forms w.r.t. contexts
4> Extracting (possibly) new information from the non-colloquial forms

Extension:- Conversion from non-colloquial to colloquial.

Tasks 1 and 2 are related in the sense that sometimes some expression may turn out to be colloquial or not, depending on the context. Tasks 2 and 3 are related, because you can't do the translation unless and until you know the context! Similarly, tasks 3 and 4 are also related, for "new information" definitely implies something that was not there in the context, or that "enhanced" the context in some way. So it does depend on the previous translation step. Any discrepancy in the previous step will baffle this step.

Thursday, November 6, 2008

Some Papers on Word Classification

Our task seems somewhat related to the problem of word classification. So here are some pointers:

1> http://portal.acm.org/citation.cfm?id=1036123

2> http://www.aifb.uni-karlsruhe.de/~sst/Research/Publications/eacl2003-pekar-staab.pdf

3> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.247

4> http://cat.inist.fr/?aModele=afficheN&cpsidt=2457953

5> http://www2.computer.org/portal/web/csdl/doi/10.1109/ICASSP.1993.319224

6> http://slp.csie.ntnu.edu.tw/ppt%5C20080717_ytlo_MWCE.ppt

7> http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=01227767

8> http://www.springerlink.com/content/e752355351t15137/ (very important)

9> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.7661

10> http://arxiv.org/abs/cmp-lg/9405029

11> http://sciencelinks.jp/j-east/article/200013/000020001300A0434831.php

12> http://home.wlv.ac.uk/~in8113/papers/coling04EEDWorkshop_pekar.pdf

13> http://arxiv.org/abs/cmp-lg/9503011

14> http://arxiv.org/abs/cmp-lg/9610004

15> http://www.aclweb.org/anthology-new/C/C69/C69-2201.pdf

16> http://www.ittc.ku.edu/publications/documents/Futrelle1993_rawc.pdf

17> https://eprints.kfupm.edu.sa/68942/

18> http://mail.udgvirtual.udg.mx/biblioteca/bitstream/123456789/1482/1/Word_classification.pdf

19> http://www.scribd.com/doc/100387/A-Classification-Approach-to-Word-Prediction

20> http://www.cse.ust.hk/~dekai/library/WU_Dekai/CarpuatSuWu_Senseval3.pdf

21> http://sig.media.eng.hokudai.ac.jp/~araki/2002/1996-D-8.pdf

22> http://www.loa-cnr.it/Papers/ijcai03.pdf

23> http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-100/Preslav_Nakov.pdf

24> http://www.corpus.bham.ac.uk/PCLC/asmussen_paper.pdf

A Note on Features

The classifiers that we examined so far (SemiL, ISDA, SVMLight, Mallet) put documents into two groups - positive or negative, depending on whether they fall on one side of the boundary or the other. Boundary is defined by kernel functions, which can be linear, quadratic, user-defined, etc.

Problem is that for document classification, words and expressions are used as features. These are smaller units and can be handled easily. But our task is a bit more subtle in the sense that we'd like to classify expressions themselves. So the fundamental question is: are there still smaller units which can be used as features? Obviously, individual letters are not a good choice. Probably words, which form the expression? I'm not sure...

If we look at the problem from another viewpoint, expressions conform to certain grammar productions. So, can we use those particular productions as our features? Need to consider this further...

Another (may be minor) problem that we encountered while dealing with the classifiers is that they treat the test input files as separate documents. So for our purposes, should we encapsulate one expression in one file and feed the files as input? Or are there better methods?

Tuesday, November 4, 2008

Papers

1> Rohit J. Kate and Raymond J. Mooney. 2007. Semi-Supervised Learning for Semantic Parsing using Support Vector Machines. This paper shows a semi-supervised SVM approach to semantic parsing. Authors modified the KRISP method of supervised semantic parsing by capping it with a transductive SVM (Chen et al's), and termed the resulting system SEMISUP-KRISP. It may come of use to us if we want to do semantic parsing. It also clarifies the idea of how transduction and semi-supervised learning can be integrated with SVMs.

2> Sugato Basu, Mikhail Bilenko, Raymond J. Mooney. 2004. A Probabilistic Framework for Semi-Supervised Clustering. This paper proposes a probabilistic model for semisupervised clustering based on Hidden Markov Random Fields (HMRFs) which works well in case of prototype-based clustering. Authors devised a K-means style algorithm (which they termed HMRF-KMeans) that gives superior performance for small datasets in higher dimensions.

3> Sugato Basu, Arindam Banerjee, Raymond J. Mooney. 2004. Active Semi-Supervision for Pairwise Constrained Clustering. This paper gives 3 algorithms PCKMeans, Explore and Consolidate, which work in concert to yield a pairwise-constrained clustering of data. Pairwise-constrained means that there are must-link and cannot-link relationships among unlabeled data. For our purpose, we can view this approach as a refinement over pure semi-supervised learning, for here we are supplying, alongwith unlabeled data, the above-mentioned must-link and cannot-link relationships.

4> Arindam Banerjee, Chase Krumpelman, Joydeep Ghosh, Sugato Basu, Raymond J. Mooney. 2005. Model-based Overlapping Clustering. It has been shown that many recent datasets actually exhibit overlapping cluster property in the sense that a single piece of data can be assigned to multiple clusters. This paper generalizes Segal et al's probabilistic relational model (PRM) and additionally offers several algorithmic modifications that improve both the performance and applicability of the model. In our case, this paper is only useful if we really hit upon a dataset which exhibits cluster overlaps.

5> Sugato Basu, Arindam Banerjee, Raymond Mooney. 2002. Semi-supervised Clustering by Seeding. Although a bit old, this paper ushers in the idea of merging KMeans with semi-supervised models as well as EM-style algorithms.

6> Brian Kulis, Sugato Basu, Inderjit Dhillon, Raymond Mooney. 2005. Semi-supervised Graph Clustering: A Kernel Approach. This paper has 6 important contributions enumerated in pages 1-2. Apart from combining Graph clustering and Kernel KMeans and capping it with semi-supervised approach, authors have shown that their method generalizes Spectral Learning, significantly outperforms HMRF-KMeans, and can detect clusters with non-linear boundaries (thanks to the connection between graph-based and vector-based clustering).

7> Hendrik K¨uck, Peter Carbonetto, Nando de Freitas. A Constrained Semi-Supervised Learning Approach to Data Association. Data association is an important problem in computer vision. Authors show a constrained semi-supervised learning approach for tackling this problem, which outperforms other existing approaches, e.g., EM algorithms and Markov Chain Monte Carlo (MCMC) methods. If we use data association at some point, then this paper will be of use.

8> Rie Kubota Ando, Tong Zhang. A High-Performance Semi-Supervised Learning Method for Text Chunking. This paper discusses a structural learning approach that is very effective in bringing out the semi-supervised aspect of text chunking.

9> David Nadeau, Peter D. Turney. A Supervised Learning Approach to Acronym Identification. Acronym identification is very important for our task, because there are several acronyms, both standard as well as non-standard, which fall under the realms of colloquialism. Examples are ASAP (as soon as possible), AFAIK (as far as I know), etc. An acronym may be either colloquialism or formal expression depending on the context. For example, WS in a particular context may either refer to William Shakespeare (colloquialism), or Western Samoa (formal expression). However, I can't say how much the idea of acronym identification will be useful to us.

10> Ion Muslea, Steven Minton, Craig A. Knoblock. Active + Semi-Supervised Learning = Robust Multi-View Learning. Multi-view learning is important where features can be grouped into subsets. So we'll reap benefit from this paper if we see subsets in the feature space. Robust multi-view learning is a semi-supervised approach to tackle this problem.

11> Rich Caruana, Nikos Karampatziakis, Ainur Yessenalina. 2008. An Empirical Evaluation of Supervised Learning in High Dimensions. Since high dimensionality in NLP datasets is very common, there must be a suitable evaluation of how supervised learning performs in such cases. This paper presents an empirical study of supervised learning evaluation.

12> Gokhan Tur, Dilek Hakkani-Tu¨r, Robert E. Schapire. 2004. Combining active and semi-supervised learning for spoken language understanding. Although written primarily for implementing a goal-oriented call routing system, this paper moves beyond by providing means for understanding spoken language (which includes colloquialisms) in an active, semi-supervised manner.

13> Nuanwan Soonthornphisaj, Boonserm Kijsirikul. 2005. Combining ILP with Semi-supervised Learning for Web Page Categorization. Although web page categorization is not our task, we can have important insight from this paper regarding how ILP (inductive logic programming) and ICT (iterative-cross training) can be applied to the problem of semi-supervised learning.

14> Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, David D. Lewis, Abdur Chowdhury, Aleksander Kołcz. Improving Automatic Query Classification via Semi-supervised Learning. Automatic query classification is a very important aspect of our problem, because it allows us to differentiate between queries for colloquial and formal terms. This paper gives a semi-supervised method for improving the process.

15> Junhui Wang, Xiaotong Shen. Large Margin Semi-supervised Learning. This paper discusses the theory of semi-supervised learning and how it can be improved to obtain larger margins.

16> Te Ming Huang, Vojislav Kecman. 2005. Performance Comparisons of Semi-Supervised Learning Algorithms. Authors present a detailed and painstaking study of several semi-supervised algorithms, and compare their performances.

17> Ke Chen, Shihai Wang. Regularized Boost for Semi-Supervised Learning. This paper introduces the important idea of boosting algorithms and goes on giving details of a regularized boosting method.

18> J. Andrew Bagnell. 2005. Robust Supervised Learning.

19> Ashish Kapoor, Eric Horvitz, Sumit Basu. Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning.

20> Yves Grandvalet, Yoshua Bengio. Semi-supervised Learning by Entropy Minimization. This paper gives a new approach to semi-supervised learning.

21> Te Ming Huang, Vojislav Kecman. 2004. Semi-supervised Learning from Unbalanced Labeled Data – An Improvement. This paper discusses an improved semi-supervised learning method which works even when the amount of labeled data is very small.

22> Avleen S. Bijral, Manuel E. Lladser, Gregory Grudic. Semi-supervised Learning of a Markovian Metric. This paper discusses important concepts of distances and metrics, and goes on detailing a semi-supervised learning method on a Markovian metric.

23> Marc'Aurelio Ranzato, Martin Szummer. 2008. Semi-supervised Learning of Compact Document Representations with Deep Networks. Compact document representations are very useful forms of data visualization. This paper discusses a semi-supervised "deep-network" approach for building compact document representations.

24> Gideon S. Mann, Andrew McCallum. 2007. Simple, Robust, Scalable Semi-supervised Learning via Expectation Regularization. This paper gives another approach to semi-supervised learning - expectation regularization.

25> Dina Goren-Bar, Tsvi Kuflik, Dror Lev. Supervised Learning for Automatic Classification of Documents using Self-Organizing Maps. This paper gives yet another approach to semi-supervised learning - self-organizing maps.

26> Irena Spasi´c, Goran Nenadi´c, Kostas Manios, Sophia Ananiadou. 2002. Supervised Learning of Term Similarities. This paper brings in the idea that term similarities may also be discovered with the help of supervised learning.

27> Thanh Phong Pham, Hwee Tou Ng, Wee Sun Lee. 2005. Word Sense Disambiguation with Semi-Supervised Learning. Word sense disambiguation is important for our purpose, and this paper gives a semi-supervised approach for tackling this problem.

Tools and Lexicons

1> http://www.comp.nus.edu.sg/~rpnlpir/
2> http://www.cs.cofc.edu/~manaris/ai-education-repository/nlp-tools.html
3> http://nlp.stanford.edu/links/statnlp.html