[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Types and tokens (was: What the ...)



la mark. clsn. cusku di'e

> I got a little lost with this type/token stuff you're using here, and I
> thought I sort of understood how texts worked in Lojban.  Do you mean
> "type" and "token" sort of like non-terminal and terminal in a formal
> grammar?  Or like a terminal and the specific instance? (e.g. KOhA-type,
> with the token "da" instantiating it)  Or something else entirely?  Just
> trying to keep up with the Rostas...

Something else.  Tokens are actual instances of things, and types
are classes whose membership criterion is equality.  Usually the
terms are only applied to linguistic objects, or rather the
graphical instances thereof.

Thus in "The cat sat on the mat", there are 6 tokens at the word level
and 22 tokens at the letter level, but only 5 word types (<the>, <cat>,
<sat>, <mat>, and <on>) and 10 letter types ignoring case (space, <a>,
<c>, <e>, <h>, <m>, <n>, <o>, <s>, <t>).  For Unix weenies, the command
"wc -w" counts word tokens, and the command
"tr A-Z a-z | tr -cs a-z | sort -u | wc -l" counts word types.

I was pointing out that you can consider <The cat sat on the mat>
to be a type too, a sentence type.  Are sentence types composed
of word types?  That seems intuitive, since sentence tokens are
obviously composed of word tokens.  But it leads to a nasty problem:
are there five or six word types in <The cat sat on the mat>?
If five, then it seems to be the same type as <The cat sat on mat>,
which is obviously false; if six, then there are two distinct
<the>-types, which contradicts the definition of "type".

(Bonus: the lovely phrase "hapax legomenon" means a type with only
a single token in a given body of writing, typically all the
writing that exists in a particular dead language.)

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
			e'osai ko sarji la lojban