[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

No Subject



06/01/93 Lojban baseline rafsi list part 5 of 5

This list is in the public domain.  However, we ask that this header be
retained on all distributed copies, so that people have some idea what
they are looking at and how to get more information.

For information about the artificial language Loglan/Lojban, please
contact The Logical Language Group, Inc.  (LLG).  We ask that you
provide a paper-mail address as well as an email address if appropriate
The LLG is funded solely by your contributions, which are encouraged for
the purpose of defraying our costs (for both electronic and paper
distribution.)

Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA                        703-385-0273
email:  lojbab@grebyn.com

                      THE lujvo-MAKING ALGORITHM

The following is the official algorithm for generating Lojban lujvo
(complex brivla, or predicate words), given a known tanru (metaphor) and
a complete list of gismu (Lojban primitive roots) and their assigned
rafsi (affixes).  Note that Lojban does not require use of the optimal,
or "best" form of a word.  Poetic usage allows any of the valid word
forms created by this algorithm to be used under appropriate
circumstances.

Given an n-term tanru and the instruction to find the highest- scoring
lujvo:

1) For all terms except the final term, look up or generate all of the
rafsi (3- and 4- letter forms).  Three-letter forms will be of the
structure CVC, CCV, CVV, or CV'V (the apostrophe is not counted as a
letter in any Lojban rule).  A standard gismu list gives the three-
letter rafsi for each gismu and for each cmavo with an assigned rafsi.
You can memorize the list also.  This is not difficult if you use the
language much:  the set of possible rafsi for each word is limited, and
because almost all possible rafsi have an assigned meaning, the more you
know, the easier it is to learn the rest by elimination.

- Given a CCVCV gismu C1C2V1C3V2, the CVC rafsi, if any, will be C1V1C3
or C2V1C3.  The CVV/CV'V rafsi, if any, will be C1V1(')V2 or C2V1(')V2.
The CCV rafsi, if any, will be C1C2V1.  Very few gismu have both a CCV
and a CVV/CV'V assigned.

- Given a CVCCV gismu C1V1C2C3V2, the CVC rafsi, if any, will be C1V1C2.
The CVV/CV'V rafsi, if any, will be C1V1(')V2.  The CCV rafsi, if any,
will be C1C2V2, or rarely, C1C2V1.

- The rafsi for cmavo is assigned more arbitrarily.  A CVV/CV'V form
cmavo will often be its own rafsi, but when this isn't possible, the
final letter is changed.  A single letter, usually an arbitrary conso-
nant, is added to a CV cmavo to make its rafsi.

- The four-letter rafsi form for any gismu is formed by dropping the
final vowel from the gismu (which is then effectively replaced by "y" in
the lujvo).

2) For the final term, look up or generate all of the three-letter
rafsi, omitting any CVC-form rafsi since a lujvo cannot end in a
consonant.  Then, for this position only, add in the full gismu itself
as a '5-letter rafsi'.

3) Since most cmavo with rafsi have CVC rafsi and none has a 5-letter
form, few cmavo can occur in the final position of a tanru used as the
basis of a lujvo. cmavo in those positions are rare anyway, the
exceptions being PA+MOI numbers.  If a cmavo in any position has no
rafsi, then it cannot be incorporated into the lujvo.  Consider
rephrasing or using zei to form an 'any-word' compound.

4) Form all of the ordered combinations of these rafsi, one rafsi per
corresponding term ordered in the sequence of their corresponding terms.

5) Audible 'hyphens' may be necessary between some adjacent rafsi to
make the word pronouncible, understandable, well-formed, and not prone
to breaking up into two-or-more smaller words.  Hyphens are never
optional; they are not permitted in-between rafsi unless they are
required.  Right-to-left testing is recommended for reasons discussed
below:

  a) If there are more than two terms, an initial CVV or CV'V rafsi
  will fall off and be heard as a separate cmavo.  It must therefore
  be glued on with the letter 'r', which nominally stands in a
  syllable by itself.
    For example sai + zba + ta'u becomes sairzbata'u (syllabized as
    sai,r,zba,TA'u).  If the initial rafsi is a CV'V, the 'r' may be
    joined onto the second syllable.  Thus sa'i + zba + ta'u becomes
    sa'irzbata'u (syllabized as sa,'ir,zba,TA'u).  If the first
    consonant of the second syllable is an 'r', the gluing 'hyphen'
    must be the letter 'n', instead of 'r' because doubled consonants
    are not permitted in Lojban.  Thus sai + rai + ta'u becomes
    sainraita'u (syllabized as sai,n,rai,TA'u and NOT sain,rai,TA'u).
    'n' is NOT permitted unless the adjacent 'r' forces it.

  If there are exactly two terms, and the initial term is a CVV or
  CV'V rafsi AND the final term is a 5-letter rafsi, an 'r' hyphen is
  needed as described above to prevent the initial rafsi from falling
  off into a separate CVV or CV'V cmavo.  As above, an 'n' is used as
  glue if and only if an 'r' cannot be used.
    Thus sai + taxfu needs hyphen 'r' to become sairtaxfu
    (sai,r,TAX,fu).  sai + ranji needs hyphen 'n' to become sainranji
    (sai,n,RAN,ji).

  If there are exactly two terms, and the initial term is a CVV or
  CV'V rafsi AND the final term is a CVV or CV'V rafsi, an 'r' hyphen
  is needed, because the lujvo is not well-formed, lacking a consonant
  cluster, and will fall apart into two CVV or CV'V cmavo.  As above,
  an 'n' is used as glue if and only if an 'r' cannot be used.
    Thus sai + ta'u needs hyphen 'r' to become sairta'u (sai,r,TA,'u).
    sai + rai needs hyphen 'n' to become sainrai (SAI,n,rai).  Note
    that hyphen in a syllable by itself is not counted in determining
    penultimate stress.  However, if joined onto a vowel syllable as
  when ta'u + sai forms ta'ursai, the vowel syllable is counted and
  is stressed if penultimate (ta,'UR,sai).

If there are exactly two terms, and the initial term is a CVV or CV'V
rafsi AND the final term is a CCV rafsi, no hyphen is needed, because
the lujvo is well-formed, having a consonant cluster, and penultimate
stress falls on part of the CVV/CV'V rafsi, preventing it from falling
off into a separate word.
  Thus sai + zba needs no hyphen 'r' to form saizba.

  b) Put y after any 4-letter rafsi form (e.g. zbasysai).  Do not
  count a syllable centered on this hyphen in determining penultimate
  stress.  (e.g. ZBAS,y,sai or ZBA,sy,sai).

  c) Put y at any proscribed C/C joint (impermissible medial consonant
  pair, e.g. nunynau).  The following are the rules summarizing
  proscribed medials:

     Given that the consonant pair is defined as C1C2, that b, d, g,
     j, v and z are voiced consonants, c, f, k, p, s, t, and x are
     unvoiced consonants, and l, m, n, and r are nasal/liquid
     consonants.

     1. C1 cannot be the same as C2.                           e.g. *kk
     2. If C1 is voiced, then C2 must either be voiced or
          nasal/liquid.  If C1 is unvoiced, then C2 must be either
          unvoiced or nasal/liquid.                                 *bf
     3. Both C1 and C2 cannot be among c, j, s, or z.               *cs
     4. *cx, *kx, *xc, *xk, and *mz are not permitted.


  Do not count a syllable centered on this hyphen in determining
  penultimate stress.  (e.g. NUN,y,nau or NU,ny,nau).

  d) Put y at any proscribed C/CC joint (e.g. nunydji).  The following
  are the rules for proscribed triples:

    The first two consonants of a consonant triple in a Lojban brivla
    must be restricted as for permissible medial consonant pairs per
    the above.  The second pair within the triple must be a
    permissible initial consonant pair.  Since you cannot get a triple
    in a lujvo unless the latter two consonants are part of a CCV
    rafsi, testing the first two consonants per c) is sufficient for
    this part of the test.  In addition, there are a few triples that
    meet the above conditions but are still not pronounceable so as to
    be easily and uniquely resolvable from other combinations.  Hence
    they are also not permitted, and require a hyphen.  These triples
    are:

                      n,dj   n,dz   n,tc   n,ts

  Do not count a syllable centered on this hyphen in determining
  penultimate stress.  (e.g. NUN,y,dji or NU,ny,dji).

  e) Test all forms starting with a series of CVC rafsi for "tosmabru
  failure", which means that the first CV will fall off into a
  separate cmavo, leaving the rest a valid lujvo.  ("*tosmabru was a
  trial word that was found to so break up, and is used as the
  archetypal example of an invalid lujvo according to this rule.)
  This is a tricky rule, but not that common a circumstance, because
  the CV falls off only if a valid lujvo remains.  The following are a
  set of simple short cuts to test for and correct all "tosmabru"
  situations.  (The same situation with an apparent le'avla form
  remaining does not break up simply because such forms are forbidden
  to le'avla.  This is the so-called "*slinku'i" rule for le'avla:  if
  you stick a CV cmavo on the front of a le'avla and it forms a valid
  lujvo, then the le'avla is NOT valid.)

  If a series of rafsi has the pattern 'CVC ... CVC + X' , where no
  'y' hyphens have been installed between any two of the CVC, there
  may be a "tosmabru" problem.
  - If X is a CVCCV long rafsi with a permissible initial as the
  consonant cluster, then even a single CVC rafsi on the front
  requires a "tosmabru test" (as in tos + mabru which would break up
  into to + smabru).  You are specifically testing here to ensure that
  the CV on the front does not fall off, leaving a lujvo composed of a
  series of CCV rafsi.
  - If X is any rafsi or partial-lujvo that causes a y hyphen to be
  installed between the previous CVC and itself by one of the above
  rules, and there are at least two CVC rafsi preceding, you must also
  test for "tosmabru" break up (as in tos + mab + bai which would have
  added a 'y' hyphen between the last two terms, and would break up
  into to + smabybai, where "smab" is a hypothetical 4-letter rafsi
  form).  You are testing here to avoid the initial CV falling off to
  leave a lujvo with a spurious CCVC 4-letter rafsi form just before
  the X component.
  NOTE THAT THE RULES DO NOT DEPEND ON THERE ACTUALLY BEING RAFSI THAT
  WOULD MAKE THE BROKEN UP WORD POSSIBLE (smab- is not the 4-letter
  form for any gismu currently assigned, but the rules do not presume
  that the listener knows which rafsi are real - they are based ONLY
  on the forms if the words.)

  The "tosmabru" test is:

    Examine all the C/C joints between the CVC rafsi, and between the
    last CVC and the X term.

    If the ALL of those C/C joints, as well as the CC in X, if we are
    dealing with the CVCCV case for X, are "bridged" by permissible
    initials, listed in Section III or the back of the gismu list,
    then the trial word will break up into a cmavo and a shorter
    brivla ("tosmaktu" would thus be valid, unlike "tosmabru").

    If any C/C joint is unbridged, i.e., is impermissible as an
    initial CC, the trial word will not break up.  It has passed the
    "tosmabru test".

    Only the first joint in a trial word needs to be unbridged in
    order to ensure resolvability.  Thus:  Install y as a hyphen at
    the first bridged joint if the "tosmabru" test fails (e.g.
    tosymabru).

    The 'lazy Lojbanist' "tosmabru test" is to add a hyphen any time
    you have a CVC rafsi followed by a CV... of 5-or-more letters,
    where the first C/C joint forms a permissible initial.  This is
    NOT a correct algorithm - it will put in hyphens that are not
    necessary resulting in words that are technically invalid.
    However, for nonce lujvo-making, if an unnecessary hyphen is
    present, the word can be successfully and unambiguously analyzed.

      If a "tosmabru" hyphen is omitted, the word is likely to be
      incorrectly analyzed.

      Note that the 'tosmabru test' requires all hyphens based on other
      rules to have been determined before conducting the test.  This is
      why this step occurs last.

6) Evaluate all combinations and select the word with the highest
score, using some algorithm.


                          SCORING ALGORITHM

This algorithm was devised by Bob and Nora LeChevalier in 1989.  It is
not the only permitted algorithm, but it usually gives a choice that
people find preferable.  This is the algorithm encoded in the lujvo-
making program sold by la lojbangirz.  The algorithm may be changed in
the future.  Note that the algorithm basically encodes a hierarchy of
priorities, preferring short words (counting an apostrophe as a half of
a letter), then words with fewer hyphens, then words with fewer
syllables and/or more vowels.

Values are attached to various properties of the lujvo.  The score is
the sum of these values.

1. Count the number of hyphens (h), including 'y', 'r', or 'n'.
2. Count the number of vowels (v) not including 'y'.
3. Count the number of apostrophes (a).
4. Count the total number of characters including hyphens and
   apostrophes (l).
5. For each rafsi component, find the value in the following list.
     Sum this total (r):

          Cvv (sai)      8                  CCVC (zbas)     4
          CCV (zba)      7                  -CCVCV (-zbasu) 3
          CV'V (ta'u)    6                  CVCC (sarj)     2
          CVC (nun)      5                  -CVCCV (-sarji) 1

The score is then 32500 - (1000 * l) + (500 * a) - (100 * h) + (10 *
r) + v
In case of ties, there is no preference.  This should be rare.

The following examples use the rafsi:

CVC = nun   CCV = zba   Cvv = nau, sai
CVCCV = sarji    CCVC- = zbas-     CV'V = ta'u

Stress is shown explicitly using capitalization in these examples.
Being algorithmic (always penultimate), it does not have to be
explicitly shown when these words are actually used.

    zba + sai                        ZBAsai
32500 - (1000 * 6) + (500 * 0) - (100 * 0) + (10 * 15) + 3 = 26653
    nun + y + nau                    NUNynau
32500 - (1000 * 7) + (500 * 0) - (100 * 1) + (10 * 13) + 3 = 25533
    sai + r + zba + ta'u             sairzbaTA'u
32500 - (1000 * 11) + (500 * 1) - (100 * 1) + (10 * 21) + 5 = 22115
    zba + zbas + y + sarji           zbazbasySARji
32500 - (1000 * 13) + (500 * 0) - (100 * 1) + (10 * 12) + 4 = 19524