> The WhitespaceAnalyzer breaks up text by spaces and tabs and newlines.
> After that, you can wildcards. This will use very little space. I
> believe leading&trailing wildcards are supported now, right?
If leading wildcards take too much time (don't know, really) then one
could also try to index
ge-dependent/-specific.
>
> Thanks,
>
> Ilya
>
> -Original Message-
> From: Dawid Weiss [mailto:dawid.we...@gmail.com]
> Sent: Sunday, August 26, 2012 3:55 AM
> To: java-user@lucene.apache.org
> Subject: Re: Efficient string lookup using Lucene
>
>> Does Luce
/-specific.
Thanks,
Ilya
-Original Message-
From: Dawid Weiss [mailto:dawid.we...@gmail.com]
Sent: Sunday, August 26, 2012 3:55 AM
To: java-user@lucene.apache.org
Subject: Re: Efficient string lookup using Lucene
> Does Lucene support this type of structure, or do I need to somehow im
> Does Lucene support this type of structure, or do I need to somehow implement
> it outside Lucene?
You'd have to implement it separately but it'd be much, much smaller
than Lucene itself (even obfuscated).
> By the way, I need this to run on an Android phone so size of memory might be
> an is
> implement it outside Lucene?
> > >
> > > By the way, I need this to run on an Android phone so size of memory
> > might
> > > be an issue...
> > >
> > > Thanks,
> > >
> > >
> > > Ilya Zavorin
> > >
&g
wid.we...@gmail.com]
> > Sent: Friday, August 24, 2012 4:50 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Efficient string lookup using Lucene
> >
> > What you need is a suffix tree or a suffix array. Both data structures
> > will allow you to perform cons
012 4:50 PM
> To: java-user@lucene.apache.org
> Subject: Re: Efficient string lookup using Lucene
>
> What you need is a suffix tree or a suffix array. Both data structures
> will allow you to perform constant-time searches for existence/ occurrence
> of any input pattern. Depending on
: Friday, August 24, 2012 4:50 PM
To: java-user@lucene.apache.org
Subject: Re: Efficient string lookup using Lucene
What you need is a suffix tree or a suffix array. Both data structures will
allow you to perform constant-time searches for existence/ occurrence of any
input pattern. Depending on
Does it mean that the resulting index will be very large?
Thanks,
Ilya
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Friday, August 24, 2012 4:59 PM
To: java-user@lucene.apache.org
Subject: Re: Efficient string lookup using Lucene
> search for a string &
Hi,
I have a similar issue.
I need lucene search to work with kanji characters (japanese).
The hits object (or topDocs) returns length = 0 for results but works well
for english.
I know my index contains matches as luke (lucene search tool) renders them.
I tried lace analyser - did not work.
Re
> search for a string "run", I do not need to find "ran" but I
> do want to find it in all of these strings below:
>
> Fox is running fast
> !%#^&$run!$!%@&$#
> run,run
With NGramFilter you can do that. But it creates a lot of tokens. For example
"Fox is running fast" becomes
F
o
rin
Sent: Friday, August 24, 2012 3:48 PM
To: java-user@lucene.apache.org
Subject: Efficient string lookup using Lucene
Hi Everyone,
I have the following task. I have a set of documents in multiple languages.
I don't know what these languages are. Any given doc may contain text in
severa
What you need is a suffix tree or a suffix array. Both data structures
will allow you to perform constant-time searches for existence/
occurrence of any input pattern. Depending on how much text you have
on the input it may either be a simple task -- see here:
http://labs.carrotsearch.com/jsuffixa
Hi Everyone,
I have the following task. I have a set of documents in multiple languages. I
don't know what these languages are. Any given doc may contain text in several
languages mixed up. So to me these are just a bunch of Unicode text files.
What I need is to implement an efficient EXACT str
14 matches
Mail list logo