AW: AW: Analyzing suggester for many fields

2014-06-12 Thread Clemens Wyss DEV
THX! -Ursprüngliche Nachricht- Von: Neil Bacon [mailto:neil.ba...@nicta.com.au] Gesendet: Freitag, 13. Juni 2014 01:48 An: java-user@lucene.apache.org Betreff: Re: AW: Analyzing suggester for many fields Hi Clemens, Goutham's code is at: https://github.com/gtholpadi/MyLucene I'm doing so

Re: AW: Analyzing suggester for many fields

2014-06-12 Thread Neil Bacon
Hi Clemens, Goutham's code is at: https://github.com/gtholpadi/MyLucene I'm doing something similar, adding weighting as some function of doc freq (and using Scala). Cheers, Neil On 13/06/14 00:19, Clemens Wyss DEV wrote: enter InputIteratorWrapper ;) i.e. new InputIteratorWrapper(tfit )

ANNOUNCE: ApacheCon deadlines: CFP June 25 / Travel Assistance Jul 25

2014-06-12 Thread Chris Hostetter
(NOTE: cross-posted announcement, please confine any replies to general@lucene) As you may be aware, ApacheCon will be held this year in Budapest, on November 17-23. (See http://apachecon.eu for more info.) ### ### 1 - Call For Papers - June 25 The CFP for the conference is still open, but w

Re: Relevancy tests

2014-06-12 Thread Doug Turnbull
Relevancy judgement lists ARE very context sensitive. For example, in a medical search application you'll have very different relevancy requirements between a point-of-care applications vs an application being used to perform general "sit at your desk" research ***even if the content being served i

Re: Relevancy tests

2014-06-12 Thread Ahmet Arslan
Hi, Relevance Judgments are labor intensive and expensive. Some Information Retrieval forums ( TREC, CLEF, etc) provide these golden sets. But they are not public. http://rosenfeldmedia.com/books/search-analytics/ talks about how to create a "golden set" for your top n queries. Also there ar

Relevancy tests

2014-06-12 Thread Ivan Brusic
Perhaps more of an NLP question, but are there any tests regarding relevance for Lucene? Given an example corpus of documents, what are the golden sets for specific queries? The Wikidump dump is used as a benchmarking tool for both indexing and querying in Lucene, but there are no metrics in terms

Re: timing merges

2014-06-12 Thread Erick Erickson
Ah, OK. Ignore me then and listen to Mike. On Thu, Jun 12, 2014 at 7:54 AM, Jamie wrote: > Erick > > We are not using Solr. We are using the latest version of Lucene directly. > When I run it in a profiler, I can see all indexing threads blocked on merge > for long stretches at a time. > > Re

Re: timing merges

2014-06-12 Thread Jamie
Erick We are not using Solr. We are using the latest version of Lucene directly. When I run it in a profiler, I can see all indexing threads blocked on merge for long stretches at a time. Regards Jamie On 2014/06/12, 4:39 PM, Erick Erickson wrote: What version of Solr/Lucene? Merging is sup

Re: timing merges

2014-06-12 Thread Michael McCandless
1000 is way too high because it will mean your index has 1000s of segments and when a merge does run it will take a very long time. It's better to do smaller more frequent merges. Try setting segmentsPerTier to 5. It's possible you are hitting too big a merge backlog, in which case the default Me

Re: timing merges

2014-06-12 Thread Erick Erickson
What version of Solr/Lucene? Merging is supposed to be happening in the background for quite a while, so I'd be surprised if this was really the culprit unless you're on an older version of Lucene. See: http://blog.trifork.com/2011/04/01/gimme-all-resources-you-have-i-can-use-them/ But this is ex

AW: Analyzing suggester for many fields

2014-06-12 Thread Clemens Wyss DEV
enter InputIteratorWrapper ;) i.e. new InputIteratorWrapper(tfit ) -Ursprüngliche Nachricht- Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] Gesendet: Donnerstag, 12. Juni 2014 16:01 An: java-user@lucene.apache.org Betreff: AW: Analyzing suggester for many fields trying to re-build

Re: timing merges

2014-06-12 Thread Jamie
Erick Well, I have users complaining about it. They say indexing stops for a long time. Currently, the following settings are applied. TieredMergePolicy logMergePolicy = new TieredMergePolicy(); logMergePolicy.setSegmentsPerTier(1000); conf.setMergePolicy(logMergePolicy); What's a good way

Re: timing merges

2014-06-12 Thread Erick Erickson
Michael is, of course, the Master of Merges... I have to ask, though, have you demonstrated to your satisfaction that you're actually seeing a problem? And that fewer merges would actually address that problem? 'cause this might be an "XY" problem Best, Erick On Thu, Jun 12, 2014 at 4:11 AM

AW: Analyzing suggester for many fields

2014-06-12 Thread Clemens Wyss DEV
trying to re-build the multi-field TermFreqIterator based on the Goutham's initial code TermFreqIteratorWrapper tfit = null; for (AtomicReaderContext readerc : readercs) { Fields fields = readerc.reader().fields();

Re: timing merges

2014-06-12 Thread Michael McCandless
Likely you should implement a custom MergeScheduler (MergePolicy picks which merges to do, and MergeScheduler schedules them). Or you could e.g. make a MergePolicy that picks only "easy-ish" merges during busy times and leaves hard merges for later. Just be very careful: if merges fall behind and