On 16-May-07, at 11:00 PM, Doron Cohen wrote:

If you enter a.b.c.d.e.f.g.h to that demo you'll see that
the demo simply breaks the input text on '.' - that has
nothing to do with filenames.

That is not what I am seeing from my testing:

a.b.c.d.e.f.g.h is not broken apart like how the snowball demo indicates it should do.

At http://snowball.tartarus.org/demo.php

"a.b.c.d.e.f.g.h" shows:

a -> a
b -> b
c -> c
d -> d
e -> e
f -> f
g -> g
h -> h

For my lucene testing, I indexed one text file with one "a.b.c.d.e.f.g.h" string in it and opened the index up using Luke. It only indexed the string a.b.c.d.e.f.g.h (and didn't parse the string based on the periods).


As a real world example, Logon.dll is being converted to "Logon.dl" rather than "Logon" and "dll" as indicated by the snowball demo.

Also:

Demo:
some-msp.msp

somemsp -> somemsp
msp -> msp

Lucene:
some-msp.msp

some
msp.msp


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to