RE: CompressingStoredFieldReader object taking lot of heap space

2016-12-21 Thread Mukul Ranjan
Hi Adrien, Could you please explain the 2nd point in detail? Thanks, Mukul Ranjan -Original Message- From: Adrien Grand [mailto:jpou...@gmail.com] Sent: Wednesday, December 21, 2016 6:43 PM To: java-user@lucene.apache.org Subject: Re: CompressingStoredFieldReader object taking lot of he

Re: Email id tokenizer (actual email id & multiple terms)

2016-12-21 Thread Trejkaz
On Wed, Dec 21, 2016 at 11:23 PM, suriya prakash wrote: > Hi, > > Thanks for your reply. > > I might have one or more emailds in a single record. Just so you know, you can add the same field more than once with the field analysed by KeywordAnalyzer, and it will still become multiple tokens. This

RE: Warming Indexes

2016-12-21 Thread Siraj Haider
Hi Uwe, Below is the code that shows how we are opening writer and searchermanager: LimitTokenCountAnalyzer limit_analyzer = new LimitTokenCountAnalyzer(analyzer, 10, true); IndexWriterConfig writer_config = new IndexWriterConfig(limit_analyzer); writer_config.setRAMBufferSizeMB(3

RE: All Fields Search

2016-12-21 Thread Uwe Schindler
Hi, This is the standard approach, there is no better way. This also keeps "scoring" working as expected, as the whole contents are seen as "one entity" during scoring. Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message

RE: Warming Indexes

2016-12-21 Thread Uwe Schindler
Hi, The warmup only happens if you reopen the searcher using the NRT APIs or searcher manager. If you just index with a single IndexWriter that has no open NRT readers, nothing will happen. To warmup when the application starts, I'd suggest to use the first IndexSearcher. The warmer in IndexWr

RE: Warming Indexes

2016-12-21 Thread Siraj Haider
Thanks Uwe, I implemented the interface and am printing some lines to see if the warmup is happening, but I never see those prints in the log after some documents are indexed. Another question is how to warmup the index when I open the index first time, i.e. when the application starts. Should I

Re: All Fields Search

2016-12-21 Thread Adrien Grand
This sounds like a good approach! Le mer. 21 déc. 2016 à 13:31, suriya prakash a écrit : > Hi, > > I have 500 fields in a document to index. > > I append all the values and index it as separate field to support all > fields search. I will also have 500 separate fields for field level search. > >

Re: CompressingStoredFieldReader object taking lot of heap space

2016-12-21 Thread Adrien Grand
This issue has been reported a couple times and is usually due to one of these issues: - readers are not always closed - the index is accessed from a non fixed thread pool - there are too many indices open at the same time Le mer. 21 déc. 2016 à 13:37, Mukul Ranjan a écrit : > Hi, > > We are

CompressingStoredFieldReader object taking lot of heap space

2016-12-21 Thread Mukul Ranjan
Hi, We are using lucene 5.5.2 for search. We ran load test for 8 hours on our setup and observed that "org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader" object is taking 210 MB out of 1.2 GB heap space. There are 8890 instances of this class present in the heap. Could you p

All Fields Search

2016-12-21 Thread suriya prakash
Hi, I have 500 fields in a document to index. I append all the values and index it as separate field to support all fields search. I will also have 500 separate fields for field level search. Is there any other better way for all fields search? Regards, Suriya

TimeLimitingCollector accuracy

2016-12-21 Thread David Causse
Hi, This subject has been discussed in the past but I don't think that any real solution was implemented yet. Here is a small test case to illustrate the problem: https://github.com/nomoa/lucene-solr/commit/2f025b18899038c8606da64c2cf9f4e1f643607f#diff-65ae49ceb38e45a3fc05115be5e61a2dR387 T

Re: Email id tokenizer (actual email id & multiple terms)

2016-12-21 Thread suriya prakash
Hi, Thanks for your reply. I might have one or more emailds in a single record. So I have to index it with white space analyser after filtering emailid alone(may be using email id tokenizer). Tokenization will happen twice( for normal indexing and for special emailid field indexing) which is co