THX!
-Ursprüngliche Nachricht-
Von: Neil Bacon [mailto:neil.ba...@nicta.com.au]
Gesendet: Freitag, 13. Juni 2014 01:48
An: java-user@lucene.apache.org
Betreff: Re: AW: Analyzing suggester for many fields
Hi Clemens,
Goutham's code is at: https://github.com/gtholpadi/MyLucene
I'm doing so
Hi Clemens,
Goutham's code is at: https://github.com/gtholpadi/MyLucene
I'm doing something similar, adding weighting as some function of doc
freq (and using Scala).
Cheers,
Neil
On 13/06/14 00:19, Clemens Wyss DEV wrote:
enter InputIteratorWrapper ;) i.e. new InputIteratorWrapper(tfit )
(NOTE: cross-posted announcement, please confine any replies to
general@lucene)
As you may be aware, ApacheCon will be held this year in Budapest, on
November 17-23. (See http://apachecon.eu for more info.)
### ### 1 - Call For Papers - June 25
The CFP for the conference is still open, but w
Relevancy judgement lists ARE very context sensitive. For example, in a
medical search application you'll have very different relevancy
requirements between a point-of-care applications vs an application being
used to perform general "sit at your desk" research ***even if the content
being served i
Hi,
Relevance Judgments are labor intensive and expensive. Some Information
Retrieval forums ( TREC, CLEF, etc) provide these golden sets. But they are not
public.
http://rosenfeldmedia.com/books/search-analytics/ talks about how to create a
"golden set" for your top n queries.
Also there ar
Perhaps more of an NLP question, but are there any tests regarding
relevance for Lucene? Given an example corpus of documents, what are the
golden sets for specific queries? The Wikidump dump is used as a
benchmarking tool for both indexing and querying in Lucene, but there are
no metrics in terms
Ah, OK. Ignore me then and listen to Mike.
On Thu, Jun 12, 2014 at 7:54 AM, Jamie wrote:
> Erick
>
> We are not using Solr. We are using the latest version of Lucene directly.
> When I run it in a profiler, I can see all indexing threads blocked on merge
> for long stretches at a time.
>
> Re
Erick
We are not using Solr. We are using the latest version of Lucene
directly. When I run it in a profiler, I can see all indexing threads
blocked on merge for long stretches at a time.
Regards
Jamie
On 2014/06/12, 4:39 PM, Erick Erickson wrote:
What version of Solr/Lucene? Merging is sup
1000 is way too high because it will mean your index has 1000s of
segments and when a merge does run it will take a very long time.
It's better to do smaller more frequent merges. Try setting
segmentsPerTier to 5.
It's possible you are hitting too big a merge backlog, in which case
the default Me
What version of Solr/Lucene? Merging is supposed to be
happening in the background for quite a while, so I'd be surprised if this
was really the culprit unless you're on an older version of Lucene.
See:
http://blog.trifork.com/2011/04/01/gimme-all-resources-you-have-i-can-use-them/
But this is ex
enter InputIteratorWrapper ;) i.e. new InputIteratorWrapper(tfit )
-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
Gesendet: Donnerstag, 12. Juni 2014 16:01
An: java-user@lucene.apache.org
Betreff: AW: Analyzing suggester for many fields
trying to re-build
Erick
Well, I have users complaining about it. They say indexing stops for a
long time.
Currently, the following settings are applied.
TieredMergePolicy logMergePolicy = new TieredMergePolicy();
logMergePolicy.setSegmentsPerTier(1000);
conf.setMergePolicy(logMergePolicy);
What's a good way
Michael is, of course, the Master of Merges...
I have to ask, though, have you demonstrated to your satisfaction that
you're actually seeing a problem? And that fewer merges would actually
address that problem?
'cause this might be an "XY" problem
Best,
Erick
On Thu, Jun 12, 2014 at 4:11 AM
trying to re-build the multi-field TermFreqIterator based on the Goutham's
initial code
TermFreqIteratorWrapper tfit = null;
for (AtomicReaderContext readerc : readercs)
{
Fields fields = readerc.reader().fields();
Likely you should implement a custom MergeScheduler (MergePolicy picks
which merges to do, and MergeScheduler schedules them).
Or you could e.g. make a MergePolicy that picks only "easy-ish" merges
during busy times and leaves hard merges for later.
Just be very careful: if merges fall behind and
15 matches
Mail list logo