I haven't yet found answer to my original question which was how to work with search for japanese characters.
Regards, Noopur Julka On Sun, Aug 26, 2012 at 9:17 AM, Devon H. O'Dell <devon.od...@gmail.com>wrote: > Seems worth mentioning in partial response to this thread's topics that > (almost) regardless of index strategy, lucene performance hinges on number > of matched documents per query, not total docs in index. There are other > mitigating factors (disk type, ram size, etc), but worst case performance > analysis can generally be modeled in terms of matched documents as opposed > to index size. > > Apologies for any spelling / grammatical errors; this is sent from my > phone. > > --dho > On Aug 25, 2012 11:02 PM, "Noopur Julka" <noopur.ju...@gmail.com> wrote: > > > Index being very large can be ruled out as Luke returned few results and > > the app is capable of returning approx 200 results. > > > > Regards, > > Noopur Julka > > > > > > > > On Sun, Aug 26, 2012 at 6:40 AM, Ilya Zavorin <izavo...@caci.com> wrote: > > > > > Does Lucene support this type of structure, or do I need to somehow > > > implement it outside Lucene? > > > > > > By the way, I need this to run on an Android phone so size of memory > > might > > > be an issue... > > > > > > Thanks, > > > > > > > > > Ilya Zavorin > > > > > > > > > -----Original Message----- > > > From: Dawid Weiss [mailto:dawid.we...@gmail.com] > > > Sent: Friday, August 24, 2012 4:50 PM > > > To: java-user@lucene.apache.org > > > Subject: Re: Efficient string lookup using Lucene > > > > > > What you need is a suffix tree or a suffix array. Both data structures > > > will allow you to perform constant-time searches for existence/ > > occurrence > > > of any input pattern. Depending on how much text you have on the input > it > > > may either be a simple task -- see here: > > > > > > http://labs.carrotsearch.com/jsuffixarrays.html > > > > > > or a complicated task if your input size is larger (larger than > memory). > > > Google search for suffix trees/ suffix arrays though, it's the data > > > structure to use here. > > > > > > Dawid > > > > > > On Fri, Aug 24, 2012 at 9:48 PM, Ilya Zavorin <izavo...@caci.com> > wrote: > > > > Hi Everyone, > > > > > > > > I have the following task. I have a set of documents in multiple > > > languages. I don't know what these languages are. Any given doc may > > contain > > > text in several languages mixed up. So to me these are just a bunch of > > > Unicode text files. > > > > > > > > What I need is to implement an efficient EXACT string lookup. That > is, > > I > > > need to be able to find ANY Unicode string exactly as it appears. I do > > not > > > care about language-specific modifications of the string. That is, if I > > > search for a string "run", I do not need to find "ran" but I do want to > > > find it in all of these strings below: > > > > > > > > Fox is running fast > > > > !%#^&$run!$!%@&$# > > > > run,run > > > > > > > > Is there a way of using StandardAnalyzer or any other analyzer and > the > > > corresponding query parser to find these? Again, my queries might be > more > > > or less random Unicode sequences and I need to find all their > accurrences > > > in the text. > > > > > > > > Essentially, what I am trying to do is implement substring matching > > more > > > efficiently that using Java's standard substring matching methods. > > > > > > > > Thanks! > > > > > > > > Ilya Zavorin > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > >