[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

rationale for Lojban's short allomorphs (rafsi); conlang implications



On conlang, Rick Harrison writes:
>The vast majority of constructed language enthusiasts agree that a
>planned language should have no allomorphs, i.e. each root-word should
>have only one form which should not change due to conjugation,
>declension, compounding, or other grammatical processes.  Allomorphs
>increase the difficulty of memorizing a vocabulary and give no benefit
>in return.
>
>It appears that Loglan and Lojban suffer from rampant allomorphy.  Any
>given 5-letter predicate might have 0, 1, 2, or 3 triliteral allomorphs
>to be used in compound words.  Unless I am mistaken, there's no way to
>predict whether a given predicate has allomorphs, and if so, what those
>allomorphs might be; each predicate's allomorphs must be memorized.  For
>example, in Loglan we find...

A two-part answer.  First comments on our rationale, and then a critique
of two hidden assumptions that Rick makes that are both false, and both
important to the design of any conlang.

The basic rationale, as John Cowan explained, is that the result is a
shorter word.  Shorter compounds are much easier to learn and memorize
(as well as to say) than longer words.  Lojban allows an unreduced form,
formed by substituting a y for the final vowel in all but the last term
of a compound - the 'y' serves as a 'glue'.  Following is a long
compound that has appeared in Lojban text or in commentary on that text:

nolraitruti'u           noble+superlative+govern+daughter
(princess - specifically the daughter of a king/queen, as opposed to
Princess Di of UK)

(5 syllables)

If there were no short forms, this word would have to be

noblytrajyturnytixnu    (8 syllables)

Given that it is desired that you expect to memorize the Lojban word,
and eventually learn it as a unitary word rather than by puzzling it
together every time from its components, it should be obvious that the
shorter word is better than the longer one.  If you lived in a country
with royalty such as the UK that had such a princess (as Elizabeth was
before she became queen) and were prone to reading, writing, and talking
about such a princess a lot, which word would you prefer to say, or
write (or type several times in a net message)?

And "princess" is not that infrequent a concept, certainly deserving of
a single word.  And the British, so I understand, make a distinction
between the various types of princess, at least in terms of how they are
titled, so that the distinction is socially and linguistically
important.  Lojban must have separate words if there are clearly two
separate concepts, as there are in this case (the 'Di' variety of
princess might be 5 terms:  noble-superlative-governor-son-spouse).

As Cowan said, the long form IS permitted as an alternative to the short
form, and might be used either in noisy environments where the longer
word has all those extra sounds as redundancy checks, or by beginners
who have not yet memorized the short rafsi or the compound, and are
creating the compound on the fly (as this word has been created
everytime it has been used thus far since we have no dictionary nor
people who have memorized such words).  The long forms are of course
needed when the words are not compounded, or you would not be able to
tell a compound from a root from a structure word.

Loglan/Lojban has reached what I believe is an optimal tradeoff between
redundancy and brevity, ease of learning and unambiguity of the
morphology.  There may be other solutions, but few if any would meet all
the goals for the language.

(The resolvable affixes of Loglan and Lojban were developed after
several years debate.  The question first arose in Zwicky's critique of
Loglan in 1969 (if not earlier).  Early versions of Loglan made
compounds by just mushing the words together in any way that worked, as
long as the resulting compound was 2 mod 3 characters long.  The problem
proved severe when people actually tried to both learn the existing
compounds and to make new compounds, after the first printed dictionary
came out in 1975.  The specific solution embedded in Lojban took 5 years
to develop (1978-82), with experimentation at several steps along the
way (involving many people, though unfortunately almost all native
English speakers).  The design you see today was NOT adopted lightly.)



Now for the more important issues - which affect other conlangs' designs
as much as Loglan/Lojban's.

I'm afraid that there are some wrong hidden assumptions that Rick makes
here, assumptions that are vital in making decisions on a conlang:  1)
that memorizing allomorphs is difficult, and 2) that there is a way of
reducing the amount of memorization needed to gain fluency in a conlang
below some arbitrary minimum.

I'll handle the first assumption by giving some details about the Lojban
system that shows our viability and relative ease of learning.  But
first, briefly, my argument against the second point.

Assuming that the set of thoughts that might be expressed linguistically
should be about the same, regardless of the language, there are only so
many options available for expressing those thoughts.  If there is 'one
word per concept', then a speaker must have memorized a separate word
for each concept in order to achieve fluency.  If polysemy exists, then
speaker has an added burden:  to memorize a somewhat smaller set of
words, but to also memorize the multiple meanings of those words
(including meanings he may rarely use) AND some means of pragmatically
distinguishing which meaning is intended.

THERE IS NO WAY AROUND THIS.  Fluent speakers DO NOT often invent words
or even derive new prefix/suffix formations when conversing.  Productive
language formation takes time to think, and taking that time in the
middle of a conversation breaks up fluency.  There is some minimum
amount that MUST be learned, even in the most regular of conlangs, and
no design trick can reduce this.

For a given language, for each concept you expect to talk or hear about,
you must learn 1) at least one word for the concept, 2) the association
of that word with that specific concept, and not to other concepts
(including false friends from the native language), 3) any other
meanings or usages associated with that word, including both polysemy,
and also pragmatic considerations, what phrases may be appended to
sentences using that word, etc.  (If you stick an object on an
intransitive verb "*I sit the store", or attach certain prepositional
phrases to a word that doesn't expect them "*I give from Mary across the
store" you get nonsense IN ANY LANGUAGE, ungrammatical garbage in most
of them.)  It takes memorization to turn WORDS into SENSE.

The ONLY thing you can do is ease the memorization PROCESS - to make it
easier to do that required memorization, to get from novice to fluency.
One way - the most frequent - is to build lots of memory hooks to some
natlang(s), but you risk semantics transfers that make your conlang not
truly an independent language (you can't avoid this, but you can
minimize it through other methods of aiding the learning process).
Another way, much used by Esperanto, is the use of affixes that modify
meanings of words in certain semi-regular ways.  Thus, learning a few
words and a few productive affixes, you multiply the vocabulary that
results from memorization.  New people then learn from seeing words that
they can easily decompose - after seeing these words over and over, they
suddenly find that they know both the word-formation rules, the affixes,
and the compounds.

Lojban carries the Esperanto technique to the ultimate extreme.  Rather
than a couple dozen short affixes (which do not in any way resemble a
root that has the same meaning), we allow EVERY root to be used as an
affix, and make those affixes resemble the roots in very regular ways.
Thus, as I show below, they are EASY to learn.  But the beginner need
learn only to RECOGNIZE a compound, and break it down, a process that
has been experimentally verified to be much easier than RECALLing the
word/affix for a concept with no hints.  A beginner can use the
long-form words as metaphors or long-form compounds while learning with
no stigma to communication - just a few more syllables - syllables that
also help when imperfect pronunciation leads to a need for greater
redundancy.  But skilled speakers sacrifice some of that redundancy for
the brevity needed to make the language speakable at a fluent rate of
speed.

Returning to the first assumption, I assert that:

  A very regular conlang CAN have allomorphs that are easy to memorize and
  Lojban has such a system that actually makes the words MORE learnable
  than they might otherwise be.

This is easy to demonstrate for Lojban.  I'll use examples to show real
Lojban words and their "rafsi", our word for the short allomorphs, how
they are derived, and how a speaker can easily learn them.  We have
three types of rafsi, of form CVV, CVC, and CCV (where CC must be a
permissible initial).  It turns out that for any given word, it is
possible that it might have one of each type, but no more than one.  It
also turns out that there are only a couple of possibilities to choose
from for a given word.


             rafsi
>predicate  "djifoa"      pattern
>---------  --------      -------
   badna    (none)        no allomorphs             (banana)

The only possible rafsi for this word are bad (123), ban (124) and ba'a
(125) because a root of the form C1V1C2C3V2 must have rafsi of the form
C1V1C2, C1V1C3, C1V1V2, C2C3V2, or C1C2V1; the latter two would give dna
and bda, which are not among the set of permitted rafsi, since they
cannot occur at the start of a word.

Thus assuming that you know the root 'badna', there are only 3 possible
short rafsi.  Let us look at the words that actually have these rafsi
next.

   barna    ba'a          C1V1V2 (CVV)               (mark, spot)

This word has only the possibilities bar (123) ban (124) and ba'a (125)
and also bra (132) which is dispreferred because of the order reversal
(there are only a couple of words that use that rafsi form).

   bangu    ban bau       C1V1C2 (CVC) C1V1V2 (CVV)  (language)

The possibilities are ban (123), bag (124), and either bau or ba'u (125).

   barda    bad           C1V1C3 (CVC)               (big, large)

The possibilities are bar bad ba'a and bra.  Note the overlap with previous
words.

   bandu    ba'u          C1V1V2 (CVV)               (defend, protect)

The possibilities are ban, bad, bau, and ba'u.  Again more overlap.

   bargu    bag           C1V1C2 (CVC)               (arch, curve over)

Possibilities are bar, bag, bra, bau, and ba'u.

and finally, to close off the one CCV that has emerged so far

   cabra    bra           C2C3V2 (CCV)               (apparatus)

and to close off bar

   banro    bar           C1V1C2 (CVC)               (grow)

cabra could have had one of cab, car, ca'a, or bra.  I won't show the
comparisons in the c---- words. banro could have ban, bar, or ba'o.  For
any given word, there is only a small set of possibilities, but the net
of possibilities obvious extends to fill almost all of the approximately
2000 possible rafsi.

We will presume, to simplify things, that we had to assign bra to cabra
due to similar conflicts in the c initial rafsi, without explaining
those choices.  Without considering other words (there are only a couple
more for each rafsi - after all, some 1400 words with about 4 choices
apiece map to 2000 rafsi, so there cannot be too much overlap).  We have

bra   cabra (apparatus) barna (mark) barda (large) bargu (arch)
bag   bangu (language) bargu (arch)
bad   badna (banana) barda (large) bandu (defend)
ban   badna (banana) barna (mark) bangu (language) bandu (defend) banro(grow)
ba'a  badna (banana) barna (mark) barda (large)
bar   barna (mark) barda (large) bargu (arch) banro (grow)
bau   bangu (language) bandu (defend) bargu (arch)
ba'u  bangu (language) bandu (defend) bargu (arch)

Now I contend that given such a small number of possible meanings for
each rafsi, it becomes easy to learn which one is associated with which
root word.  If you know that bra is cabra, it can't be any of the other
three. If you learn that banro (our other example-limiter) is 'bar', then
bar cannot be used for the other three.

bargu (arch) competes against bangu (language) for both bag and bau.
But language is a much more important concept, and furthermore will be
found in final position in compounds (french-language, english-language,
etc. for all nationalities), a position that in Lojban requires a vowel
at the end of the rafsi. bandu (defend) is the other competitor for bau
and ba'u.  Language, as the most used of these three concepts in final
position of compounds gets bau; then bargu can have 'bag' and bandu
ba'u.

Since bandu now has a rafsi, this allows us to use bad for barda, since
clearly 'large' will be used in compounds more than 'banana'.  This
makes sense, since 'large' will be likely in a modifier position rather
than final.  It doesn't need the CVV ba'a, which can be used for barna
(mark), a heavy final position word.

This leaves only ban, which by elimination of other that got a CVC
already must either be barna (mark), bangu (language), or badna
(banana).  We chose bangu, even at the expense of poor badna, since the
number of banana compounds I can think of is quite small, while there
are many places where, even with bau assigned to bangu, a shorter word
would result when language is the modifier rather than the modified.

I've done a little handwaving here that won't make much sense to those
not familiar with our morphology.  Please forgive - the proof is in the
results.  I had actual lists of thousands of proposed compounds using
these words when I matched roots to rafsi, trying to minimize the number
of syllables in the set of all compounds.  The results are given above
for each of the words.

To a limited extent, the Lojban learner can recreate some of those
decisions on the fly.  You are looking for a rafsi because you have a
compound word that needs it.  You think of the possibilities, and
usually the best rafsi for the position is the correct one.  At most,
you are typically picking from among 2 or three possibilities.  It
becomes very easy to learn to guess correctly.  I know, because I
usually do.

(That I can speak the language as well as anyone, yet have not
'memorized' the rafsi, shows that the memorization process is NOT a
critical path step in learning, and that ad hoc guessing usually
suffices.  You learn rafsi, like place structures, as you need them.  We
have a LogFlash version to aid in rapid rafsi memorization, but I'm the
only one who has significantly used it that I know of, and I never
finished - I got into working on the textbook.)

It takes no hand-waving to see that the more rafsi you actually DO know,
the easy it becomes to learn the rest.  You have a closed set of
three-letter forms, nearly all of which has a meaning.  By the time you
know a third of them, that elimination process above goes from a 1/4
gues to a 1/2 gues.  By the time you know 2/3 of them, you probably
actually know 90% because you can determine so many by elimination of
alternatives.


Finally, I said that learning the rafsi helps you cement in your
knowledge of the root words themselves.  If you know 'bau' is a rafsi
for bangu, you know that C1 is b, V1 is a, and V2 is u. This rather
reduces the burden of learning the other two letters.  If you know the
other rafsi is "ban", then you know that either C2 or C3 is 'n', and
you can almost certainly guess by now.  (In speech you can probably just
slur over the other consonant and the listener will guess what word
you wanted from context. %^)

A long answer - but I hope a good one.  Our system is carefully thought
out to maximize learnability while keeping all of the other important
features of Loglan design.  But most important.  Any other conlang
system that attempts to do better, will make some tradeoffs - tradeoffs
that themselves will create or add to learnability problems.
----
lojbab                                                      lojbab@grebyn.com
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA                        703-385-0273

For information about the artificial language Loglan/Lojban, please
provide a paper-mail address to me via mail or phone.  We also have
limited introductory information available electronically.  The LLG is
funded solely by your contributions, which are encouraged for the purpose
of defraying our costs (for both electronic and paper distribution.)