> The WhitespaceAnalyzer breaks up text by spaces and tabs and newlines.
> After that, you can wildcards. This will use very little space. I
> believe leading&trailing wildcards are supported now, right?
If leading wildcards take too much time (don't know, really) then one
could also try to index
ge-dependent/-specific.
>
> Thanks,
>
> Ilya
>
> -Original Message-
> From: Dawid Weiss [mailto:dawid.we...@gmail.com]
> Sent: Sunday, August 26, 2012 3:55 AM
> To: java-user@lucene.apache.org
> Subject: Re: Efficient string lookup using Lucene
>
>> Does Luce
/-specific.
Thanks,
Ilya
-Original Message-
From: Dawid Weiss [mailto:dawid.we...@gmail.com]
Sent: Sunday, August 26, 2012 3:55 AM
To: java-user@lucene.apache.org
Subject: Re: Efficient string lookup using Lucene
> Does Lucene support this type of structure, or do I need to somehow im
> Does Lucene support this type of structure, or do I need to somehow implement
> it outside Lucene?
You'd have to implement it separately but it'd be much, much smaller
than Lucene itself (even obfuscated).
> By the way, I need this to run on an Android phone so size of memory might be
> an is
> implement it outside Lucene?
> > >
> > > By the way, I need this to run on an Android phone so size of memory
> > might
> > > be an issue...
> > >
> > > Thanks,
> > >
> > >
> > > Ilya Zavorin
> > >
&g
ene?
> >
> > By the way, I need this to run on an Android phone so size of memory
> might
> > be an issue...
> >
> > Thanks,
> >
> >
> > Ilya Zavorin
> >
> >
> > -Original Message-
> > From: Dawid Weiss [mailto:da
012 4:50 PM
> To: java-user@lucene.apache.org
> Subject: Re: Efficient string lookup using Lucene
>
> What you need is a suffix tree or a suffix array. Both data structures
> will allow you to perform constant-time searches for existence/ occurrence
> of any input pattern. Depending on
: Friday, August 24, 2012 4:50 PM
To: java-user@lucene.apache.org
Subject: Re: Efficient string lookup using Lucene
What you need is a suffix tree or a suffix array. Both data structures will
allow you to perform constant-time searches for existence/ occurrence of any
input pattern. Depending on
Does it mean that the resulting index will be very large?
Thanks,
Ilya
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Friday, August 24, 2012 4:59 PM
To: java-user@lucene.apache.org
Subject: Re: Efficient string lookup using Lucene
> search for a string &
Hi,
I have a similar issue.
I need lucene search to work with kanji characters (japanese).
The hits object (or topDocs) returns length = 0 for results but works well
for english.
I know my index contains matches as luke (lucene search tool) renders them.
I tried lace analyser - did not work.
Re
> search for a string "run", I do not need to find "ran" but I
> do want to find it in all of these strings below:
>
> Fox is running fast
> !%#^&$run!$!%@&$#
> run,run
With NGramFilter you can do that. But it creates a lot of tokens. For example
"Fox is running fast" becomes
F
o
I can't speak for any non-Latin languages, but how about simply using the
StandardAnalyzer plus the EdgeNGramFilter for indexing (but not query.) The
latter would allow a query of "run" to match "running".
-- Jack Krupansky
-Original Message-
From: Ilya Zavorin
Sent: Friday, August 2
What you need is a suffix tree or a suffix array. Both data structures
will allow you to perform constant-time searches for existence/
occurrence of any input pattern. Depending on how much text you have
on the input it may either be a simple task -- see here:
http://labs.carrotsearch.com/jsuffixa
13 matches
Mail list logo