Monday, December 10, 2012

Word Sense and Subjectivity

Authors: Janyce Wiebe, Rada Mihalcea



Venue: ACL 2006



Research questions:

1> Can subjectivity labels be assigned to word senses? Yes.

2> Can automatic subjectivity analysis be used to improve word sense disambiguation? Yes.

3> Subjectivity annotation of word senses instead of words, sentences or clauses/phrases.



Methods:

For research question 1>

(a) Agreement between subjectivity ("subjective", "objective", "both", "uncertain") annotators.

(b) Designing a "subjectivity score" for WordNet synsets.

For research question 2>

The output of a subjectivity sentence classifier is input to a word-sense disambiguation system, which is in turn evaluated on the nouns from the SENSEVAL-3 English lexical sample task.



Background:

1> Subjective expressions are of three types:

(a) references to private states

(b) references to speech (or writing) events expressing private states

(c) expressive subjective elements

2> Subjectivity analysis:

(a) identifying subjective words and phrases

(b) subjectivity classification of sentences, words or phrases/clauses in context

(c) applications: review classification, text mining for product reviews, summarization, information extraction, question answering



Inter-annotator agreement:

Judge 1 (a co-author) tagged 354 synsets (64 words). Judge 2 (not a co-author) tagged 138 synsets (32 words) independently.

Overall agreement 85.5%, kappa value 0.74. Authors tend to highlight high agreement values and kappa values.

Causes of uncertainty:

(a) subjective senses are missing in the dictionary

(b) the hypernym may have subjective senses that meddle with the current synset



Subjectivity scoring:

1> Find distributionally similar words (DSW).

2> Determine similarity of a word-sense with each DSW. Let us call it "sim". So, for k word senses and p DSWs, we get a k-by-p matrix of "sim" scores.

3> Whenever a DSW appears in a subjective context (in MPQA corpus), we add its "sim" score to the subjectivity score, "subj". Whenever a DSW appears in a non-subjective context, we subtract its "sim" score from "subj".

And irrespective of subjective/non-subjective context, we add "sim" score to total subjectivity score, "totsubj".

4> Do step 3 for all DSWs.

5> Divide "subj" by "totsubj" to obtain final subjectivity score for a word sense.



Subjectivity scoring evaluation:

On 354 word senses.

For each sense, a subjectivity score is determined first. The score is thresholded to obtain a "subjectivity label" (+1, -1). Different thresholds tried, based on precision and recall.

Informed random baseline - fixed precision, max recall one.

DSW choice - similarity_all, similarity_selected.

Criteria - precision, recall, break-even point (where precision == recall).



Subjectivity for WSD:

Make an existing WSD system subjectivity-aware (by assigning subjectivity scores to sentences containing ambiguous words), and compare its performance with that of the original system.

Hypothesis: instances of subjective senses are more likely to be in subjective sentences.



Data/corpus:

MPQA

SENSEVAL-3



Open research questions:

1> Opposite of research question 2. Can word sense disambiguation help automatic subjectivity analysis?

2> Assign subjectivity labels to WordNet entries, thereby helping subjectivity-aware word search and travels along "subjectivity trails".



Cited papers that might be relevant to my research:

1> D. McCarthy, R. Koeling, J. Weeds, and J. Carroll. 2004. Finding predominant senses in untagged text. In Proc. ACL 2004.

2> D. Lin. 1998. Automatic retrieval and clustering of similar words. In Proceedings of COLING-ACL, Montreal, Canada.

3> J. Jiang and D. Conrath. 1997. Semantic similarity based on corpus statistics and lexical tax onomy. In Proceedings of the International Conference on Research in Computational Linguistics, Taiwan.

4> J. Wiebe. 2000. Learning subjective adjectives from corpora. In Proc. AAAI 2000.

5> Determining the sentiment of opinions. COLING 2004.

6> Words with attitude. Kamps and Marx. 2002.

7> Mining and summarizing customer reviews. KDD 2004.

8> Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. EMNLP 2003.

9> Determining the semantic orientation of terms through gloss analysis. CIKM 2005.



Citing papers that might be relevant to my research:

-- Title of the paper
-- Authors
-- Topic



Bibtex entry:

@inproceedings{Wiebe:2006:WSS:1220175.1220309,
author = {Wiebe, Janyce and Mihalcea, Rada},
title = {Word sense and subjectivity},
booktitle = {Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics},
series = {ACL-44},
year = {2006},
location = {Sydney, Australia},
pages = {1065--1072},
numpages = {8},
url = {http://dx.doi.org/10.3115/1220175.1220309},
doi = {http://dx.doi.org/10.3115/1220175.1220309},
acmid = {1220309},
publisher = {Association for Computational Linguistics},
address = {Stroudsburg, PA, USA},
}

No comments: