Re: Duplicate filtering

2016-09-20 Thread Đạt Cao Mạnh
Solr already support de-duplication when adding new documents. You can refer to the doc at https://cwiki.apache.org/confluence/display/solr/De-Duplication On Tue, Sep 20, 2016 at 12:18 PM Vjeran Marcinko < vjeran.marci...@email.t-com.hr> wrote: > Hello, > > I'm pretty much Lucene newb, so wonderi

Re: Optimising segments merges

2016-09-20 Thread lukes
Thanks Mike... Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Optimising-segments-merges-tp4296997p4297021.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscri

Re: Optimising segments merges

2016-09-20 Thread Michael McCandless
Yes, you can ... see the TieredMergePolicy setters. But again I would strongly recommend using Lucene's defaults here. Mike McCandless http://blog.mikemccandless.com On Tue, Sep 20, 2016 at 5:44 PM, lukes wrote: > Thanks a lot Mike. Can we control of how often natural merge should happen, > or

Re: Optimising segments merges

2016-09-20 Thread lukes
Thanks a lot Mike. Can we control of how often natural merge should happen, or what are the factors that define when to kick off "natural merging" ? Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Optimising-segments-merges-tp4296997p4297017.html Sent from the Luce

Re: Optimising segments merges

2016-09-20 Thread Michael McCandless
Lucene takes care of merging ("natural merging") as you add documents, commit, etc. If your index is still going to be changing it's best to never forceMerge and let natural merging run at its defaults. And, yes, deleted documents are reclaimed by merging. Mike McCandless http://blog.mikemccand

[ANNOUNCE] Apache Lucene 6.2.1 released

2016-09-20 Thread Shalin Shekhar Mangar
20 September 2016, Apache Lucene™ 6.2.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 6.2.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires f

Optimising segments merges

2016-09-20 Thread lukes
Hi, In my application i am committing(indexWriter.commit() on every single/batch of documents, but now as a result there are lots of segments getting generated(One thing can be, i don't do commit, but just add document to indexWriter. But now, if system crashes then uncommited documents wouldn't

Re: Strange index corruption related to numeric fields when upgrading from 6.0.1

2016-09-20 Thread Erick Erickson
A wild shot in the dark: Are the square brackets really part of the field name? They have never officially been supported, from the Ref Guide: "Field names should consist of alphanumeric or underscore characters only and not start with a digit. This is not currently strictly enforced, but other fi

Strange index corruption related to numeric fields when upgrading from 6.0.1

2016-09-20 Thread Jan-Willem van den Broek
Hi all, I have an application that works fine with 6.0.1, but if I go to 6.1.0 or 6.2.0 then I occasionally get a corrupted index where the SegmentMerger keeps breaking on a numeric field. This is the exception I get: ... (stack of application code) ... Caused by: java.lang.IllegalArgumentExce

Migration Lucene 4.7.0 -->6.0.1 - NumericUtils

2016-09-20 Thread Ludovic Bertin
Hi there, I'm migrating an application from Lucene 4.7.0 to Lucene 6.0.1. I'm facing a problem with this piece of code : public List getDistinctValues(IndexReader reader, EventField field) throws IOException { List values = new ArrayList(); Fields fields = MultiFields.getFields(reader);

Re: FacetResult getTopChildren

2016-09-20 Thread Michael McCandless
Hmm I don't think Lucene's facets make that easy today, but it would maybe be minor code change: in IntTaxonomyFacets.getTopChildren where it checks if values[ord] > 0 you just need > yourMinCount instead. Maybe open an issue? Mike McCandless http://blog.mikemccandless.com On Mon, Sep 19, 2016 a