[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The gismu creation algorithm



mi di'e cusku
> I frequently feel as if letter order was not really considered
> in that process.

la lojbab di'e cusku
> It is certainly a coincidence.  If you think you have a better match
> for a word, I still have all 20 meg of gismu data runs around [...]

Well, only some bytes are enough :-) I'm not saying different gismu
would have better scores, only that the current gismu seem to have
higher scores when letter order in not taken into account.

Let me illustrate my point with one example for each source language;
this is interesting even if only a coincidence. It seems that if there
is an ordered match, then a longer unordered match is likely, and you
didn't have to code a more complex algorithm.

    gismu   etymology         score w/ order   score w/o order
    -----   ---------------   --------------   ---------------
    jdari   Chinese 'jian'    2                3
    fagri   English 'fair'    3                4
    palta   Hindi   'tal'     2                3
    canre   Spanish 'aren'    3                4
    kabri   Russian 'kubak'   2                3
    sumji   Arabic  'juml'    2                3

co'o mi'e paulos.

    Paulo S. L. M. Barreto  --  Software Analyst  --  Unisys Brazil
    ***  Alternative e-mail address:  <pbarreto@unisys.com.br>  ***
    Standard disclaimer applies ("I do not speak for Unisys", etc.)
                       e'osai ko sarji la lojban.