Re: Dubious error message?

2016-08-04 Thread Trejkaz
On Fri, Aug 5, 2016 at 2:51 PM, Erick Erickson wrote: > Question 2: Not that I know of > > Question 2.1. It's actually pretty difficult to understand why a single _term_ > can be over 32K and still make sense. This is not to say that a > single _text_ field can't be over 32K, each term within that

Re: Dubious error message?

2016-08-04 Thread Erick Erickson
Question 2: Not that I know of Question 2.1. It's actually pretty difficult to understand why a single _term_ can be over 32K and still make sense. This is not to say that a single _text_ field can't be over 32K, each term within that field is (usually) much less than that. Do you have a real-wor

Dubious error message?

2016-08-04 Thread Trejkaz
Trying to add a document, someone saw: java.lang.IllegalArgumentException: Document contains at least one immense term in field="bcc-address" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefi

Re: Why we need org.apache.lucene.codecs.Codec

2016-08-04 Thread Uwe Schindler
Hi, The Codec class is the abstract base class for all index codecs. The implementation is loaded via SPI from classpath. To understand how this works read API doc's of Java ServiceLoader which describes the process. Uwe Am 4. August 2016 17:09:46 MESZ, schrieb Greg Bowyer : >Codecs are loaded

Re: Why we need org.apache.lucene.codecs.Codec

2016-08-04 Thread Greg Bowyer
Not quite sure what you mean, lucene needs some way to load a codec, and give parts of an index written with different codecs it would need tonload and select the right code at the right time. Consider, for example the upgrade path. Let's say you have segments written with code 5.x and we in place

Re: Why we need org.apache.lucene.codecs.Codec

2016-08-04 Thread Aravinth T
I understand that, my question is different why we are loading it with SPI, why we explicitly controlling the loading of Codecs On Thu, 04 Aug 2016 20:39:46 +0530 Greg Bowyer wrote Codecs are loaded with the java service loader interface. That file is

Re: Why we need org.apache.lucene.codecs.Codec

2016-08-04 Thread Greg Bowyer
Codecs are loaded with the java service loader interface. That file is the hook used to tell the service loader that this jar implements Codec. Lucene internally calls service loader and asks what codecs are there. On Wed, Aug 3, 2016, at 11:23 PM, aravinth thangasami wrote: > I don't understand

Re: no concurrent merging?

2016-08-04 Thread Bernd Fehling
Yes, excactly, that's it. But is it a Lucene or a Solr problem? Should Solr use a different reader from DBQ or can Lucene do something to solve this because it is reported as a Lucene issue? Regards Bernd Am 04.08.2016 um 16:02 schrieb Mikhail Khludnev: > Hello, > There is https://issues.apache

Re: no concurrent merging?

2016-08-04 Thread Mikhail Khludnev
Hello, There is https://issues.apache.org/jira/browse/LUCENE-7049 On Thu, Aug 4, 2016 at 4:35 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Lucene's merging is concurrent, but Solr unfortunately uses > UninvertingReader on each DBQ ... I'm not sure why. I think you should ask > o

Re: no concurrent merging?

2016-08-04 Thread Michael McCandless
Lucene's merging is concurrent, but Solr unfortunately uses UninvertingReader on each DBQ ... I'm not sure why. I think you should ask on the solr-user list? Or maybe try to change your deletes to be by Term instead of Query? Mike McCandless http://blog.mikemccandless.com On Thu, Aug 4, 2016 a

Re: BufferedUpdateStreams breaks high performance indexing

2016-08-04 Thread Michael McCandless
Wonderful, thanks for bringing closure! Mike McCandless http://blog.mikemccandless.com On Thu, Aug 4, 2016 at 3:14 AM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > After updating to version 5.5.3 it looks good now. > Thanks a lot for your help and advise. > > Best regards > Bernd >

no concurrent merging?

2016-08-04 Thread Bernd Fehling
While increasing the indexing load of version 5.5.3 I see threads where one merging thread is blocking other merging threads. But is this concurrent merging? Bernd "Lucene Merge Thread #6" - Thread t@40280java.lang.Thread.State: BLOCKED at org.apache.lucene.index.IndexWriter.mergeMiddle(I

Re: BufferedUpdateStreams breaks high performance indexing

2016-08-04 Thread Bernd Fehling
After updating to version 5.5.3 it looks good now. Thanks a lot for your help and advise. Best regards Bernd Am 29.07.2016 um 15:04 schrieb Michael McCandless: > The deleted terms accumulate whenever you use updateDocument(Term, Doc), or > when you do deleteDocuments(Term). > > Deleted queries a