Re: Hindi, diacritics and search results

2009-07-10 Thread Robert Muir
there is really no default in lucene a good start for hindi would be to try WhitespaceAnalyzer. On Fri, Jul 10, 2009 at 9:13 PM, OBender Hotmail wrote: > I'm using default analyzer. Actually one that is set by default by Compass > framework but I assume it is the same that would be used in Lucen

RE: Hindi, diacritics and search results

2009-07-10 Thread OBender Hotmail
I'm using default analyzer. Actually one that is set by default by Compass framework but I assume it is the same that would be used in Lucene by default. Which one should I use? -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Friday, July 10, 2009 6:13 PM To: java-us

Re: Hindi, diacritics and search results

2009-07-10 Thread Robert Muir
Which analyzer in particular are you using? Its probably not doing what you want for hindi. These "diacritics" are important (vowels, etc). On Fri, Jul 10, 2009 at 3:10 PM, OBender wrote: > Hi All, > > > > I'm using the default setup of lucene (no custom analyzers configured) and > came across t

Hindi, diacritics and search results

2009-07-10 Thread OBender
Hi All, I'm using the default setup of lucene (no custom analyzers configured) and came across the following issue: In Hindi if there is a letter with a diacritic in a phrase lucene will find the phrase with this letter even if the search string is for the letter without a diacritics. Is this

Re: Is my app a good fit for Lucene?

2009-07-10 Thread Erick Erickson
It would be helpful if you told us what analyzers you're using andwhat your search code looks like. Even better would be a small, self-contained demonstration app showing the issue. You could well be right that the text format is tripping up tokenizing, but there are other issues. You may have to

Is my app a good fit for Lucene?

2009-07-10 Thread Andy Faibishenko
I have a GUI application which needs to open large files (hundreds of MB) and be able to search through them quickly for user specified strings. These files are frequently updated while the user is viewing them and the updates are captured by the application. Also, the files contain records which

Re: Lucene boosting only on matching field values

2009-07-10 Thread Grant Ingersoll
Yes, see the Payload functionality and the BoostingTermQuery: http://www.lucidimagination.com/search/?q=Payload On Jul 9, 2009, at 6:42 PM, Eric Chu wrote: Hi all, I was wondering if there is any way to do a boost on the document based on which value is in a field matched by a query. ie,

Re: CompareBottom and setBottom in TopFieldCollector and FieldComparator

2009-07-10 Thread Mark Miller
There are a lot of calls to compare that only compare to the bottom (think of the common case when the queue fills quickly). Set and compare bottom cache that value. So you can pre cache the bottom ord and save derefing into the array. It could just as easily be a call to compare, but it would be s

CompareBottom and setBottom in TopFieldCollector and FieldComparator

2009-07-10 Thread Raimon Bosch
Hi, I don't understand the uses of compareBottom and setBottom. I've seen that setBottom is setting always the minimal score during the comparisons but I don't see why we need this value. In the TopFieldCollector I saw this piece of code: final int cmp = reverseMul * comparator.c

[ANN] Luke + Hadoop, alpha version

2009-07-10 Thread Andrzej Bialecki
Hi all, I prepared a special edition of Luke, the Lucene Index Toolbox, that works with Lucene indexes located on any filesystem supported by Hadoop 0.19.1. At the moment I'm looking for feedback how to best integrate this functionality with various bits and pieces of Luke. You can download the