[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sample KWIC index for Lojban dictionary



Chris Handley made good suggestions for a KWIC index for the Lojban
dictionary.  Different separators are especially helpful since they
make it trivial to write formatting functions that format in many
different ways.

Here is a sample of what Chris suggests:

abdomen = betfu (bef, be'u) = x1 is a/the abdomen/belly/lower trunk of x2

Using Chris's suggestion, it is easy to devise several different
formatting strategies for different media or different preferences:
for example, you could format keywords so they are justified in a
column in the middle of the page or with keywords on the left.  Also,
Chris's suggestion makes it easy to format for typesetting for hard
copy printing.  Indeed, it would be easy to write an automatic line
breaking algorithm that would handle most narrow columns.  Manual
editing would be minimal.

Actually, what I am really saying is that the electronic master for
the dictionary should be written is a manner such that it is really
easy to create different output formats.

One additional suggestion: list the rafsi in order cvc, ccv, cv'v
with an empty slot marked by a comma so that a person who wants to put
rafsi with the same morphology in the same column can do so easily.
(Of course a regexp lets you do the same thing, but this would make it
easier.)

Then you could produce output like any of these:


abdomen = betfu (bef, be'u) = x1 is a/the abdomen/belly/lower trunk of x2

abdomen         betfu   bef     be'u x1 is a/the abdomen/belly/
                                lower trunk of x2

betfu   bef     be'u    x1 is a/the   abdomen  /belly/lower trunk of x2


and so on, with or without embedded typesetting commands.



As for my preferred layout (ignoring fonts, etc), here it is:


accessing    klaji  laj           x1 is a street/avenue/lane/drive/
                                   cul-de-sac/way/alley/ at x2 accessing x3

accident     snuti  nut     nu'i  x1 is an accident/unintentional
                                   on the part of x2; x1 is an accident

             snuti  nut     nu'i  x1 is an accident/unintentional
                                   on the part of x2; x1 is an accident

accomodates  vasru  vas vau       x1 contains/holds/encloses/accomodates/
                                   includes contents x2 within;
                                   x1 is a vessel containing x2

accomplishes snada                x1 succeeds in/achieves/completes/
                                   accomplishes x2

according    cimde                x1 is a dimension of space/object x2
                                   according to rules/model x3

             lanzu  laz           x1 is a family/clan/tribe
                                   with members x2 bonded/tied/joined
                                   according to standard x3


Key words that are repeated are left out of the beginning of the
second and subsequent entries.  This makes this format easier to read,
like a two level index.  Rafsi are lined up by morphology.

Also, in this format, the second entry for `accident' is unnecessary
and should be removed.  It would not be hard to go through a final
list and remove such entries manually.  Nor would it take much time to
go through an automatically formatted list to manually edit line
breaks, etc.

With suitable fonts, you could make printed entries that look like this:

accomodates  vasru  vas vau
    x1 contains/holds/encloses/accomodates/
    includes contents x2 within;
    x1 is a vessel containing x2

according    cimde
    x1 is a dimension of space/object x2
    according to rules/model x3


This sort of entry might even even fit in two columns, as in most
dictionaries.

    Robert J. Chassell               bob@gnu.ai.mit.edu
    Rattlesnake Mountain Road        (413) 298-4725 or (617) 253-8568 or
    Stockbridge, MA 01262-0693 USA   (617) 876-3296 (for messages)