> search for a string "run", I do not need to find "ran" but I
> do want to find it in all of these strings below:
>
> Fox is running fast
> !%#^&$run!$!%@&$#
> run,run
With NGramFilter you can do that. But it creates a lot of tokens. For example
"Fox is running fast" becomes
F
o
I can't speak for any non-Latin languages, but how about simply using the
StandardAnalyzer plus the EdgeNGramFilter for indexing (but not query.) The
latter would allow a query of "run" to match "running".
-- Jack Krupansky
-Original Message-
From: Ilya Zavorin
Sent: Friday, August 2
What you need is a suffix tree or a suffix array. Both data structures
will allow you to perform constant-time searches for existence/
occurrence of any input pattern. Depending on how much text you have
on the input it may either be a simple task -- see here:
http://labs.carrotsearch.com/jsuffixa
Hi Everyone,
I have the following task. I have a set of documents in multiple languages. I
don't know what these languages are. Any given doc may contain text in several
languages mixed up. So to me these are just a bunch of Unicode text files.
What I need is to implement an efficient EXACT str
So for Lucene 3.6, is the right way to do this to create a new Document and add
new Fields based on the old Fields (with the settings you want them to have for
term vector offsets and positions, etc.) and then call updateDocument on that
new Document?
Thanks,
Mike
-Original Message-
Fro
Calling IR.document does not restore your 'original Document'
completely. This is really an age-old trap.
So don't update documents this way: its fine to fetch their contents
but nothing goes thru the effort to ensure that things like term
vectors parameters are the same as what you originally prov