Tuesday, August 19, 2008

Real Examples

1> Example not directly related to important chemistry facts

From http://cultureofchemistry.blogspot.com/2008/07/protecting-groups.html

"The boys had the tent next door to ours. I came back from dinner one night to find a very happy squirrel just making off with a chip container from the kids tent. At which point I remembered the dried fruit I'd left in my pack after the morning hike. Whew...it was still there. The rodents had been attracted to the far more tasty snack leavings next door. The boys tent is serving as (a chemist would say) a protecting group."

There are a lot of things to be considered. First, look at the boldfaced expressions. These are either colloquial expressions, or incorrect word-forms coming probably out of haste, or a pithy clause like the last one.

"Boys" and "kids" would really be "boys' " and "kids' ". "Whew" is an expression suggesting pure fantasy or amazement on the author's part. "A chemist would say" alludes to the fact that a protecting group behaves nearly the same way the boys' tent was behaving to squirrels.

But probably these expressions are not bringing out any important chemistry fact.

2> Example directly related to an important chemistry fact

Let's look at another para from
http://cultureofchemistry.blogspot.com/2008/06/weird-words-of-science-isotope.html

"Each atom of an element has a characteristic number of protons - positively charged particles - in their nucleus. An atom with five protons is boron. One with 82? Lead."

Here, "One with 82?" is most definitely a colloquial expression and is not by any chance a complete sentence. It is a question. "What is the atom that has 82 protons in it?" The answer, "Lead", is also not a full sentence, but it refers to the sentence "The atom that has 82 protons in it, is Lead." So, these colloquial expressions are actually refering to important chemical details.

9 comments:

Unknown said...

Instead of link to the entire blog, please click on "links to the post" and link that URL so that we know the context of the lines.

Unknown said...

I guess the first problem is identifying colloquial speech in text. To identify colloquial speech, we need to first establish: "What is colloquial speech?"

For example, in

"The boys had the tent next door to ours. I came back from dinner one night to find a very happy squirrel just making off with a chip container from the kids tent. At which point I remembered the dried fruit I'd left in my pack after the morning hike. Whew...it was still there. The rodents had been attracted to the far more tasty snack leavings next door. The boys tent is serving as (a chemist would say) a protecting group."

Is "kids" colloquialism because it actually should be "kids'" and the apostrophe is missing or because a kid really is a baby goat and it is not being used in that sense? The greater issue is when a word is considered a part of formal English and when not. The meaning child for the term "kid" is gradually getting more acceptance. So, when does it get accepted into English?

What about "I'd"?

I would not think that "a chemist would say" is colloquial speech. Can you explain?

So, as you see, there can be differences of opinion. Now, assuming we have a program to identify colloquial speech, how do we get a gold standard to compare with when human beings themselves may not agree fully on what is colloquial speech?

Unknown said...

In the previous post, I also meant to say, that perhaps, "making off" should also be a colloquial term, no?

Shibamouli Lahiri said...

I have felt something similar even before I read the comments. It seemed to me that there are some expressions which we may term "colloquial expressions", but which appear all too often in formal situations. Again, as you mentioned, there are differeces in opinion about whether an expression is really colloquial or not. For example, "I'd"-like things are now really part of standard English.

Again there are meaning-related ambiguities, e.g., between "kids" and "kids'".

I think "A chemist would say" is more an aside, than a colloquialism. It also struck me first, so you'll see that in my post I didn't brand it as a colloquialism, rather I wrote "or a pithy clause like the last one." So I really didn't mark that clause as a colloquialism.

I think there can never be any "Gold Standard" as such, for each moment we speak newer and newer things. So each moment generates newer and newer colloquialisms. However, we can amass the accepted colloquialisms of a particular time-span (in a vocabulary or a dictionary), and mark it to serve as our Gold Standard.

Unknown said...

1. "I'd" is acceptable in spoken English but not in written English. So, do you consider it a colloquialism?

2. Assuming we can come up with an automated method for detecting colloquialism, how will you evaluate how good your system does?

Shibamouli Lahiri said...

1> In my opinion, "I'd"-type expressions should be considered colloquialisms, at least for the reason that we can expand them to "I would"-like things.

2> First we need to construct or obtain a unanimous vocab (or dictionary) of colloquialisms. Then the criteria of "goodness" is whether the automaton comes up with exactly those expressions that are assumed to be colloquialisms - no more and no less. That is, a) it doesn't detect expressions that are not in the vocab, and b) that it doesn't skip expressions that are in the vocab.

This vocab-checking method might be naive and counter-intuitive. But it's easy to construct.

If we consider a grammar checker instead, we'll need to have a set of grammar rules for written or colloquial English first. Then we'll have to check which expressions conform to those rules, which do not, and mark them accordingly.

In this case, "goodness" will be determined by whether the automaton is checking all possible derivations or not, because checking all possible derivations may ensure that no colloquialisms are skipped, and no formal expressions are taken as colloquialisms. But checking all possible derivations is a brute-force idea, so again we need to prune the search space.

There is still another problem. Even if we check all derivations, the inherent ambiguity of English language (and any other language) will prevent us from determining beyond doubt whether an expression is really a colloquialism or not. For example, a rule for generating colloquialisms might actually generate a formal expression in some of its derivations, or vice versa. In these cases, we may seek contextual help to clarify matters.

Unknown said...

1. You will never get an "unanimous" vocabulary. We should make do with ones with acceptable quality.

2. Instead of writing our own grammar rules, we can simply reuse some of the available grammar checkers.

Shibamouli Lahiri said...

It's OK to use vocabs of acceptable quality, but in reusing the extant grammar rules, we'll either have to make sure that they do not skip some of the colloquialisms we have in mind, i.e., they should not fail to detect those expressions which could be detected manually. In case existing grammar rules cannot ensure this, we'll have to augment them with our rules.

Unknown said...

I was thinking that we would identify formal language using grammar rules. From the class classified as not formal, we would have to write our own rules to detect colloquialisms. What do you think?