How to store and retrieve latest utf8mb4 emoji / smiley characters in lucene index

2016-07-29 Thread Kumaran Ramasubramanian
Hi All, Am using lucene 4.10.4. Using lucene index, Is there any way to store and retrieve latest utf8 and utf8mb4 emoji / smiley characters?? In any latest lucene version?? Thanks in advance. -- Kumaran R

Re: org.apache.lucene.index.CorruptIndexException: checksum failed

2016-07-29 Thread Ziming Dong
I upgraded lucene to 6.1.0, installed rocketstor 6318A driver, then used checkIndex to rescue index, but my program still crashed. I find that if just one computer builds index, everything is fine, but if I start second computer to build index, one of this two program will crash several hours later

Re: get enumeration of all terms starting at a given term after lucene 4

2016-07-29 Thread Michael McCandless
Use seekCeil instead of seekExact. Mike McCandless http://blog.mikemccandless.com On Fri, Jul 29, 2016 at 9:44 AM, Mukul Ranjan wrote: > Thanks Parit!!! I will try the below solution. > > Thanks, > Mukul Ranjan > > -Original Message- > From: Parit Bansal [mailto:Parit.Bansal@sib.swiss

RE: get enumeration of all terms starting at a given term after lucene 4

2016-07-29 Thread Mukul Ranjan
Thanks Parit!!! I will try the below solution. Thanks, Mukul Ranjan -Original Message- From: Parit Bansal [mailto:Parit.Bansal@sib.swiss] Sent: Friday, July 29, 2016 7:11 PM To: java-user@lucene.apache.org Subject: Re: get enumeration of all terms starting at a given term after lucene 4

Re: get enumeration of all terms starting at a given term after lucene 4

2016-07-29 Thread Parit Bansal
Hi Mukul, Provided terms are sorted how about doing if (termsEnum.seekExact(text)) { BytesRef text; while ((text = con.foreignTermsEnum.next()) != null) { // keep looping till your value is a prefix } } - Parit On 07/29/2016 03:30 PM, Mukul Ranjan wrote: Hi Parit, PrefixT

RE: get enumeration of all terms starting at a given term after lucene 4

2016-07-29 Thread Mukul Ranjan
Hi Parit, PrefixTermsEnum is removed in lucene 5.1 so we can not use this now. Thanks, Mukul -Original Message- From: Parit Bansal [mailto:Parit.Bansal@sib.swiss] Sent: Friday, July 29, 2016 3:20 PM To: java-user@lucene.apache.org Subject: Re: get enumeration of all terms starting at a g

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-29 Thread Michael McCandless
The deleted terms accumulate whenever you use updateDocument(Term, Doc), or when you do deleteDocuments(Term). Deleted queries are when you delete by query, but I don't think DIH would be doing that unless you asked it to ... maybe a Solr user/dev knows better? Mike McCandless http://blog.mikemc

Re: get enumeration of all terms starting at a given term after lucene 4

2016-07-29 Thread Parit Bansal
On 07/29/2016 08:27 AM, Mukul Ranjan wrote: lucene version from lucene 3.6 to lucene 5.5.2. After 3.6, >indexReader terms api is removed which used to give list of terms. >I have used below code to get the termEnum, but it has no option to >pass the value of the field which is used to get the ma

Re: Lucene Optimization

2016-07-29 Thread Parit Bansal
On 07/13/2016 12:43 AM, Siraj Haider wrote: We currently use Lucene 2.9 and to keep the indexes running faster we optimize the indexes during night. In our application the volume of new documents coming in is very high so most of our indexes have to merge segments during the day too, when the

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-29 Thread Bernd Fehling
Yes, with default of 10 it performs very much better. I didn't take into count that DIH uses updateDocument for adding new documents but after thinking about the "why" I assume that this might be because you don't know if a document already exists in the index. Conclusion, using DIH and setting seg