Authors: Francis Heylighen, Jean Marc Dewaele
Research question: quantifying formality
Data/corpus:
1> Two speech styles and one written style: (a) informal conversation among students, (b) oral examination, (c) essay produced in a written test
2> Frequency dictionaries of Italian and Dutch (for measuring frequencies of deictic and non-deictic words)
3> French interlanguage data
Findings/conclusions:
1> The most fundamental purpose of language production is communication: making oneself understood by someone else.
2> Surface formality - formality for its own sake
3> Deep formality - formality to express meaning more clearly and completely
4> There is a close parallel between natural languages and artificial (e.g., programming) languages. The reason programming languages are known as "formal languages" is that they all show a VERY high degree of (deep) formality. So, natural languages, the authors posit, will also show programming-language-type formality, if formalized ad nauseam.
[Note: A very similar idea is Lotfi Zadeh's PNL (precisiated natural language). "Formalization" is equivalent to Zadeh's "precisiation".]
5> Fuzziness - situation where the reference of an expression is not unambiguously determined (e.g., "It is hot" (how hot?) or "I am in love" (love or infatuation or fling?))
[Note: See Fuzzy Logic and Fuzzy Set Theory]
6> Expressions can be both fuzzy and context-dependent (a "tall" building in NYC is not the same as a "tall" building in State College). In fact, it is difficult to clearly separate fuzziness and context-dependence.
7> Formal styles tend to avoid not only context-dependent expressions, but also
fuzzy ones. In practice, formal speakers will tend to choose the least fuzzy expressions that can be applied without too much effort. But since the information necessary to resolve fuzziness is by definition not completely under the control of the communicator, while the information specifying the context is, we should expect much more variation between formal and informal styles on the level of contextuality than on the level of fuzziness.
8> Spectrum of fuzziness and context-dependence. High fuzziness, low context-dependence - politician's speech. Low fuzziness, high context-dependence - poetry.
Variation along the expressivity axis is less natural in the sense that it will always to some degree flout Grice's (1975) maxims of informativeness and avoidance of ambiguity, in the case of poetry in order to create unique artistic effects, in the case of the politician beating around the bush in order to simply avoid communication.
9> More formal messages have less chance to be misinterpreted by others who do not share the same context as the sender. This is clearly exemplified by written language, where there is no direct contact between sender and receiver, and hence a much smaller sharing of context than in speech. We should thus expect written language in general to be more formal than spoken language. The definition also implies that validity or comprehensibility of formal messages will extend over wider contexts: more people, longer time spans, more diverse circumstances, etc.
10> Formality is rigid; meanings don't shift. Informality is flexible; meanings can shift over time, place, person or discourse.
11> Formal style is detached, impersonal, less direct and more objective. Informal style is interactive and more involved.
12> Time ("now", "then"), place ("here", "there"), person ("he", "she") and discourse deixis ("yes", "no", "notwithstanding", "therefore", "however"). Other examples of discourse deixis are anaphora and interjections.
13> Nouns, adjectives, articles and prepositions are non-deictic. Pronouns, adverbs, verbs and interjections are deictic. Conjunctions are deixis-neutral.
14> F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2
The frequencies are actually percentages of the number of words belonging to a particular category.
15> PCA on word frequencies (actually proportions) yielded formality as the most important dimension of writing style variation.
Psychology-related findings:
1> Formality depends on situation. Formality will be highest in those
situations where accurate understanding is essential, such as contracts, laws, or
international treaties. Second, formality will be higher when correct interpretation is more difficult to achieve. This is the case with feedback-less conversations. For example, phone conversations are more formal than face-to-face conversations, and mails are less formal than books or articles.
2> Expression (E) + Context (C) -> Interpretation (I)
The more is C, the less formal E can be. The less is C, the more formal E is.
For example, the larger the difference (or distance) between two communicators (in terms of psychology, culture, age, class, social rank, nationality or education), the more formal their communication will be.
People who are psychologically close, such as siblings, spouses or intimate friends, will tend to be minimally formal in their exchanges. We would venture that the highest degree of informality will be found among identical twins that were raised together, who completely share their cultural, social and even biological backgrounds.
3> Audience size. All other things being equal, the larger the audience, the less the different receivers and the sender will have in common, and thus the smaller the shared context. => Higher formality.
Moreover, the larger the audience, in general, the more important it will be to secure accurate understanding. Therefore, we may expect that speeches or texts directed to a large audience will be more formal than comments addressed to one or a few persons.
4> The longer the time span between sending and receiving, the less will remain of the original context in which the expression was produced. For example, reports written for archiving purposes will be more formal than notes taken to remember tomorrow’s agenda. This may also in part explain why spontaneous speeches, produced on the spot, have a much lower formality than speeches prepared at an earlier moment. Another way to test this proposition empirically might consist in measuring the formality of messages sent through fast media (e.g. fax or electronic mail) versus slow media (e.g. postal mail). A message that can be expected to reach the addressee the same day should on average be less formal than a message that takes several days to get through.
5> Finally, the factor of discourse deixis suggests that formality would be higher at the beginning of a conversation or text, because there is not any previous discourse to refer to as yet. Testing this hypothesis is straightforward: it suffices to collect a range of opening sentences or opening paragraphs from articles, speeches or conversations and compare their average formality with the formality of sentences from the middle of the same language sample.
6> Gender. Women’s speech is more formal in the "surface" sense, but less formal in the "deep" sense. It appears that women tend in general to be more intimate or involved in conversations ("rapport talk"), whereas men remain more distant or detached towards their conversation partners ("report talk").
7> Introversion. Introverts use higher (deep) formality, extroverts use lower (deep) formality.
8> Level of education. Highly educated people use more deeply formal language than less highly educated people.
Open research questions:
1> Is evoked contextuality (deictic and anaphoric) a good measure of overall contextuality, and thus of formality?
2> Instead of PCA, how about doing an LDA on word frequencies? Or maybe a manifold regularization (or kernel method)? Will that impart additional insight into formality (or any other hitherto unknown dimension of style variation)?
3> Extension of F-score. As we know, F-score works well across different languages. Does it work well for Bengali as well? We can do a PCA on Bengali word (POS) frequencies, and see if nouns, adjectives and prepositions get positive loadings, whereas verbs, adverbs and interjections get negative loadings.
Related work:
1> Hasan, R. (1984) Ways of saying: ways of meaning. in: R. P. Fawcett, M.A.K. Halliday, S.M. Lamb, A. Makkai (eds.), The semiotics of Culture and Language. Vol. 1 Language as Social Semiotic (pp. 105-162) London & Dover: Pinter.
Explicit and implicit styles
2> Lexical density
Other related work:
1> Klir, G. & Folger, T. (1987) Fuzzy Sets, Uncertainty, and Information. Prentice Hall, Englewood Cliffs, NJ.
2> van Brakel, J. (1992) The Complete Description of the Frame Problem, Psycoloquy 3 (60) frameproblem 2.
3> Givón, T. Function, structure and language acquisition, in: The crosslinguistic study of language acquisition: Vol. 1, D.I. Slobin (ed.), Hillsdale, Lawrence Erlbaum, 1008-1025.
4> Leckie-Tarry, H. (1995) Language and context. A functional linguistic theory of register. (edited by David Birch), London-New York: Pinter.
5> Halliday, M.A.K. (1985) Spoken and written language. Oxford: Oxford University Press.
6> Uit Den Boogaert, P.C. (1975) Woordfrekwenties. In geschreven en gesproken Nederlands. Oosthoek, Scheltema & Holkema, Utrecht.
7> Besnier, N. (1988) The Linguistic Relationships of Spoken and Written Nukulaelae. Language 64, 707-736.
8> Biber, D. & Hared, M. (1992) Dimensions of Register Variation in Somali. Language Variation and Change, 4, 41-75.
More labels/tags for this post: context-dependent, context, anaphora, formality continuum, expressivity, observer's paradox, linguistic complementarity principle, HyperTalk, nominalization, verbalization
Bibtex entry:
@TECHREPORT{Heylighen99formalityof,
author = {Francis Heylighen and Jean-marc Dewaele},
title = {Formality of Language: definition, measurement and behavioral determinants},
institution = {},
year = {1999}
}
Research question: quantifying formality
Data/corpus:
1> Two speech styles and one written style: (a) informal conversation among students, (b) oral examination, (c) essay produced in a written test
2> Frequency dictionaries of Italian and Dutch (for measuring frequencies of deictic and non-deictic words)
3> French interlanguage data
Findings/conclusions:
1> The most fundamental purpose of language production is communication: making oneself understood by someone else.
2> Surface formality - formality for its own sake
3> Deep formality - formality to express meaning more clearly and completely
4> There is a close parallel between natural languages and artificial (e.g., programming) languages. The reason programming languages are known as "formal languages" is that they all show a VERY high degree of (deep) formality. So, natural languages, the authors posit, will also show programming-language-type formality, if formalized ad nauseam.
[Note: A very similar idea is Lotfi Zadeh's PNL (precisiated natural language). "Formalization" is equivalent to Zadeh's "precisiation".]
5> Fuzziness - situation where the reference of an expression is not unambiguously determined (e.g., "It is hot" (how hot?) or "I am in love" (love or infatuation or fling?))
[Note: See Fuzzy Logic and Fuzzy Set Theory]
6> Expressions can be both fuzzy and context-dependent (a "tall" building in NYC is not the same as a "tall" building in State College). In fact, it is difficult to clearly separate fuzziness and context-dependence.
7> Formal styles tend to avoid not only context-dependent expressions, but also
fuzzy ones. In practice, formal speakers will tend to choose the least fuzzy expressions that can be applied without too much effort. But since the information necessary to resolve fuzziness is by definition not completely under the control of the communicator, while the information specifying the context is, we should expect much more variation between formal and informal styles on the level of contextuality than on the level of fuzziness.
8> Spectrum of fuzziness and context-dependence. High fuzziness, low context-dependence - politician's speech. Low fuzziness, high context-dependence - poetry.
Variation along the expressivity axis is less natural in the sense that it will always to some degree flout Grice's (1975) maxims of informativeness and avoidance of ambiguity, in the case of poetry in order to create unique artistic effects, in the case of the politician beating around the bush in order to simply avoid communication.
9> More formal messages have less chance to be misinterpreted by others who do not share the same context as the sender. This is clearly exemplified by written language, where there is no direct contact between sender and receiver, and hence a much smaller sharing of context than in speech. We should thus expect written language in general to be more formal than spoken language. The definition also implies that validity or comprehensibility of formal messages will extend over wider contexts: more people, longer time spans, more diverse circumstances, etc.
10> Formality is rigid; meanings don't shift. Informality is flexible; meanings can shift over time, place, person or discourse.
11> Formal style is detached, impersonal, less direct and more objective. Informal style is interactive and more involved.
12> Time ("now", "then"), place ("here", "there"), person ("he", "she") and discourse deixis ("yes", "no", "notwithstanding", "therefore", "however"). Other examples of discourse deixis are anaphora and interjections.
13> Nouns, adjectives, articles and prepositions are non-deictic. Pronouns, adverbs, verbs and interjections are deictic. Conjunctions are deixis-neutral.
14> F = (noun frequency + adjective freq. + preposition freq. + article freq. – pronoun freq. – verb freq. – adverb freq. – interjection freq. + 100)/2
The frequencies are actually percentages of the number of words belonging to a particular category.
15> PCA on word frequencies (actually proportions) yielded formality as the most important dimension of writing style variation.
Psychology-related findings:
1> Formality depends on situation. Formality will be highest in those
situations where accurate understanding is essential, such as contracts, laws, or
international treaties. Second, formality will be higher when correct interpretation is more difficult to achieve. This is the case with feedback-less conversations. For example, phone conversations are more formal than face-to-face conversations, and mails are less formal than books or articles.
2> Expression (E) + Context (C) -> Interpretation (I)
The more is C, the less formal E can be. The less is C, the more formal E is.
For example, the larger the difference (or distance) between two communicators (in terms of psychology, culture, age, class, social rank, nationality or education), the more formal their communication will be.
People who are psychologically close, such as siblings, spouses or intimate friends, will tend to be minimally formal in their exchanges. We would venture that the highest degree of informality will be found among identical twins that were raised together, who completely share their cultural, social and even biological backgrounds.
3> Audience size. All other things being equal, the larger the audience, the less the different receivers and the sender will have in common, and thus the smaller the shared context. => Higher formality.
Moreover, the larger the audience, in general, the more important it will be to secure accurate understanding. Therefore, we may expect that speeches or texts directed to a large audience will be more formal than comments addressed to one or a few persons.
4> The longer the time span between sending and receiving, the less will remain of the original context in which the expression was produced. For example, reports written for archiving purposes will be more formal than notes taken to remember tomorrow’s agenda. This may also in part explain why spontaneous speeches, produced on the spot, have a much lower formality than speeches prepared at an earlier moment. Another way to test this proposition empirically might consist in measuring the formality of messages sent through fast media (e.g. fax or electronic mail) versus slow media (e.g. postal mail). A message that can be expected to reach the addressee the same day should on average be less formal than a message that takes several days to get through.
5> Finally, the factor of discourse deixis suggests that formality would be higher at the beginning of a conversation or text, because there is not any previous discourse to refer to as yet. Testing this hypothesis is straightforward: it suffices to collect a range of opening sentences or opening paragraphs from articles, speeches or conversations and compare their average formality with the formality of sentences from the middle of the same language sample.
6> Gender. Women’s speech is more formal in the "surface" sense, but less formal in the "deep" sense. It appears that women tend in general to be more intimate or involved in conversations ("rapport talk"), whereas men remain more distant or detached towards their conversation partners ("report talk").
7> Introversion. Introverts use higher (deep) formality, extroverts use lower (deep) formality.
8> Level of education. Highly educated people use more deeply formal language than less highly educated people.
Open research questions:
1> Is evoked contextuality (deictic and anaphoric) a good measure of overall contextuality, and thus of formality?
2> Instead of PCA, how about doing an LDA on word frequencies? Or maybe a manifold regularization (or kernel method)? Will that impart additional insight into formality (or any other hitherto unknown dimension of style variation)?
3> Extension of F-score. As we know, F-score works well across different languages. Does it work well for Bengali as well? We can do a PCA on Bengali word (POS) frequencies, and see if nouns, adjectives and prepositions get positive loadings, whereas verbs, adverbs and interjections get negative loadings.
Related work:
1> Hasan, R. (1984) Ways of saying: ways of meaning. in: R. P. Fawcett, M.A.K. Halliday, S.M. Lamb, A. Makkai (eds.), The semiotics of Culture and Language. Vol. 1 Language as Social Semiotic (pp. 105-162) London & Dover: Pinter.
Explicit and implicit styles
2> Lexical density
Other related work:
1> Klir, G. & Folger, T. (1987) Fuzzy Sets, Uncertainty, and Information. Prentice Hall, Englewood Cliffs, NJ.
2> van Brakel, J. (1992) The Complete Description of the Frame Problem, Psycoloquy 3 (60) frameproblem 2.
3> Givón, T. Function, structure and language acquisition, in: The crosslinguistic study of language acquisition: Vol. 1, D.I. Slobin (ed.), Hillsdale, Lawrence Erlbaum, 1008-1025.
4> Leckie-Tarry, H. (1995) Language and context. A functional linguistic theory of register. (edited by David Birch), London-New York: Pinter.
5> Halliday, M.A.K. (1985) Spoken and written language. Oxford: Oxford University Press.
6> Uit Den Boogaert, P.C. (1975) Woordfrekwenties. In geschreven en gesproken Nederlands. Oosthoek, Scheltema & Holkema, Utrecht.
7> Besnier, N. (1988) The Linguistic Relationships of Spoken and Written Nukulaelae. Language 64, 707-736.
8> Biber, D. & Hared, M. (1992) Dimensions of Register Variation in Somali. Language Variation and Change, 4, 41-75.
More labels/tags for this post: context-dependent, context, anaphora, formality continuum, expressivity, observer's paradox, linguistic complementarity principle, HyperTalk, nominalization, verbalization
Bibtex entry:
@TECHREPORT{Heylighen99formalityof,
author = {Francis Heylighen and Jean-marc Dewaele},
title = {Formality of Language: definition, measurement and behavioral determinants},
institution = {},
year = {1999}
}
No comments:
Post a Comment