[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

ANNOUNCEMENT: English-Lojban 15000+ entry dictionary draft avail on ftp



Following is the document header for the first published draft of LLG's
official English to Lojban dictionary.  It is very incomplete and very
drafty, but contains 15604 entries and takes up almost 2.4 Meg unzipped
and 600K zipped.  Both zipped and unzipped versions may be found on the
ftp server:  ftp.cs.yale.edu directory pub/lojban/draft/dictionary
The file names are ENGDICT.GIS and ENGDICT.ZIP.

I am also uploading an incremental change to the gismu list.  There have
been a few incidental changes since the upload of 2 weeks ago, but I
wanted the version to be the same as that used in the dictionary.
This is in filenames GISMU.LIS and LOGDATA.RAW (identical files) in
the pub/lojban directory.

I am also publishing the place structure keyword list developed mostly
by Colin Fine and Nora.  This is in file OBLIQUE.KEY in the pub/lojban
directory.

Nick's current draft lujvo list of 3800 words, which is not yet
incorporated or official, is in the pub/lojban/incoming directory, file
JVOSTE3.  His list does not include the automatically generated lujvo
based on SE conversions of the gismu list, another 2200 words.

Comments are of course very welcome, though I would appreciate they be
clearly distinguished between typos, formatting issues, and technical
issues.

I have promised a dictionary for 7 years now.  In July at Logfest, I
promised a draft dictionary in September (this year), and for once I
have made such a deadline, though the draft is rather less than I had
hoped to have done before publishing.  This dictionary is not complete,
but it is already far better than what I thought could be accomplished
when I started.  It is in good enough shape that I think it becomes the
most usable and complete English-order Lojban word list yet produced and
so I can commend it into the covetous hands (well, computers at least)
of the community.

Thanks go out to all of you who have supported us over the years.

Acknowledgements are due to too many people.  A few are named in the
header below, whose impact was most felt on the current drive.

There is a lot more to come in the near future, but this will hopefully
serve to get people going on using the language.

It will also hopefully inspire some of you into donations to the cause.
Our finances could use a boost to break even, and publishing costs have
yet to be budgeted.

.o'acaise'i.uoru'e

Enjoy!

----
lojbab                                                lojbab@access.digex.net
Bob LeChevalier, President, The Logical Language Group, Inc.
2904 Beau Lane, Fairfax VA 22031-1303 USA                        703-385-0273
 For the artificial language Loglan/Lojban, see ftp.cs.yale.edu  /pub/lojban
    or see Lojban WWW Server: href="http://xiron.pc.helsinki.fi/lojban/";

==============================================================================
English/Lojban Dictionary - First draft official publication 26 September 1994

Copyright 1994, The Logical Language Group, Inc.
Bob LeChevalier, President                            lojbab@access.digex.net   
2904 Beau Lane, Fairfax VA 22031-1303 USA                        703-385-0273

Posted for comment and reference only.  Please do not redistribute
without verifying that this is the latest version available.  This
version is far from final form and may contain duplicates, typos, and
more serious errors.

What this document IS:

This is a first draft of what will eventually be a much larger
English/Lojban dictionary file.

It contains 15604 English entries, one entry per line.

It contains all English entries so-far derived from the official version
of the gismu list being published simultaneously (8391).  This includes
some number of references to lujvo embedded in the published gismu list.
These entries are marked with a "*" in column 1.

It contains all English entries derived from a place structure keyword
list (6828) prepared by Colin Fine and by Nora and Bob LeChevalier and
being published simultaneously with this document.  These are in a
simpler, column-based, more abbreviated form than the gismu list
entries, and stand out strongly in the list due to white space.  These
entries have NOT been weeded at all, and may contain useless
information; their purpose is to ensure the most complete possible
coverage of English semantic space, and they thus cannot be properly
weeded until other English-order words have been completely added.
These entries are marked with a "&" in column 1.

It contains a number of entries (140) in the same form as gismu list
entries, but for lujvo derived from abstractions and other simple
transformations of the corresponding gismu, such that the place
structure closely resembles the original gismu.  These entries are
marked with a "%" in column 1.

It contains a small number (246) of miscellaneous and cross-reference
entries proposing metaphors for lujvo, and indicating other
English-entries where relevant meanings may be found.  These were added
ad hoc where it was recognized that the semantic coverage of the entries
omitted important meanings for the English keyword.  Because they are ad
hoc, they may be erroneous or misleading, but they may fill a user's
need for a meaning where the keyword does not appear in the gismu list
text with semantics corresponding to your desired meaning.  These
entries are marked with a "@" in column 1.

The above lists are merged together and sorted alphabetically.  Use
column 1 sorting to separate them if desired.

What this document is NOT

It is unformatted, with the intent of making it usable on a computer
without an appropriate word processor/database program, but it contains
marks and guides to aid LLG in later formatting.  Layout was based on
final formatting needs, and as a result there are some clumsinesses
associated with the file when limited to computer use.  Most significant
is that the gismu list-based entries use columns extending out as far as
around 900 with no wrapping, and the Lojban gismu from which the line
was derived appears at the RIGHTMOST end of that line.  However the
English word may not be best translated by the gismu, as when it is
derived from a lujvo proposal incorporated in the gismu list.  To find
these, you need to examine all occurrences of the English keyword in the
entry.  These occurrences are marked with the | symbol (which is easier
to spot on a computer screen than the ~ character that will be used in
the formatted document).

This document is not complete!

- There are several thousand entries to be generated from approximately
6000 lujvo in the files generated by Nick Nicholas, John Cowan, and Bob
LeChevalier.  This list may be up to 5 times (or more) the length of the
current file.  There were 20000 and 40000 entries in the raw list
generated in two different ways by John Cowan's key-word-in-context
(KWIC) program from the 3000 gismu in the list back in July, and we
aren't sure which will be used.  Nick Nicholas has several hundred lujvo
that have been used in Lojban text and discussion since the start of
1994, that have not been added to his list.  Whether they are included
will depend on his available time in the next couple of months.

- There are a large but unknown number of entries to be derived from the
cmavo list.  There are 6500 such entries automatically generated by John
Cowan's key-word-in-context (KWIC) program.  A random survey indicated
that more than half of these will be deleted before being incorporated
in this list.

- There are an unknown number of entries from the gismu list yet to be
generated.  These are associated with a small number of words that occur
with frequency greater than 20 in the text of the gismu list as of last
July.  The list of such words is given below, with the number of
occurrences in the gismu list.  Some (like "is" and "of" will obviously
generate very few or no English entries, while others may result in
several entries.  An ! indicates some words that seem especially likely
to generate a lot of entries.  You will find few if any entries in the
gismu-list-derived format (* lines) for these words, but you may find
entries based on the @ or & line formats.

1491 is
 919 of
 865 a
 438 in
 338 to
 305 also
 291 the
 267 by
 224 for
 173 from
 166 with
 126 an
 121 at
 111 not
 110!property
 109!quantity
 107!material
 102 be
  99 or
  95!aspect
  95!standard
  91!made
  91 on
  91!species
  86 and
  81!body
  81!object
  79 about
  73!part
  66!contains
  66 may
  65!culture
  59 as
  59!form
  56!under
  52!reflects
  51!conditions
  50 that
  49!metaphor
  48!nationality
  46!event
  46!source
  45!dimension
  44!shape
  42 are
  40!composition
  39 including
  39!subject
  38!language
  37!breed
  35!location
  33!commodity
  33!state
  32 non
  31 this
  30 into
  30!type
  30 which
  29!strain
  28!function
  28!person
  27!set
  26 but
  26!need
  26!specific
  25!audience
  25!frame
  25 has
  25!necessarily
  25!purpose
  25!reference
  25!surface
  24 agentive
  24!action
  24!locus
  23!time
  22!among
  22!place
  22!over
  22!system
  21!direction
  21!activity
  21!process
  21!force
  20!properties
  20!objects
  20!means
  20!point
  20!used
  20!tool
  20!sumti

This document is very "drafty".  It has not been spell-checked, nor have
duplicates been weeded (and there are expected to be a lot of duplicates
in the file, especially due to overlaps between & lines and * lines).

This document is subject to change/replacement/removal at any time
without notice.  Specifically, ANY change in a gismu place structure at
this point will require some amount of work, and there are open
technical issues that could affect a number place structures (but we
hope they will not).