Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
Uwe, I’m very sorry, misprinted your name ;) > On 12 нояб. 2015 г., at 21:15, Uwe Schindler wrote: > > Hi, > >>> The big question is: Do you need the results paged at all? >> >> Yup, because if we return all results, we get OME. > > You get the OME because the paging collector cannot handle

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
Hi, > On 12 нояб. 2015 г., at 21:15, Uwe Schindler wrote: > > Hi, > >>> The big question is: Do you need the results paged at all? >> >> Yup, because if we return all results, we get OME. > > You get the OME because the paging collector cannot handle that, so this is > an XY problem. Would i

RE: 500 millions document for loop.

2015-11-12 Thread Uwe Schindler
Hi, > > The big question is: Do you need the results paged at all? > > Yup, because if we return all results, we get OME. You get the OME because the paging collector cannot handle that, so this is an XY problem. Would it not be better if you application just gets the results as a stream and p

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
Toke, I just look throw code we already using such method IndexSearcher indexSearcher = getIndexSearcher(searchResult); TopDocs topDocs; ScoreDoc currectScoreDoc = p.startScoreDoc; for (int page = 1;

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
Hi, > > Hi, > > The big question is: Do you need the results paged at all? Yup, because if we return all results, we get OME. > Do you need them sorted? Nope. > If not, the easiest approach is to use a custom Collector that does no > sorting and just consumes the results. Main bottleneck

RE: 500 millions document for loop.

2015-11-12 Thread Uwe Schindler
Hi, The big question is: Do you need the results paged at all? Do you need them sorted? If not, the easiest approach is to use a custom Collector that does no sorting and just consumes the results. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
Toke, thanks! We will look at this solution, looks like this is that what we need. > On 12 нояб. 2015 г., at 20:42, Toke Eskildsen > wrote: > > Valentin Popov wrote: > >> We have ~10 indexes for 500M documents, each document >> has «archive date», and «to» address, one of our task is >> c

Re: 500 millions document for loop.

2015-11-12 Thread Toke Eskildsen
Valentin Popov wrote: > We have ~10 indexes for 500M documents, each document > has «archive date», and «to» address, one of our task is > calculate statistics of «to» for last year. Right now we are > using search archive_date:(current_date - 1 year) and paginate > results for 50k records for pa

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
We are using 4.10.4 and it is not possible move right now to 5.x version. Thanks! > On 12 нояб. 2015 г., at 19:47, Anton Zenkov > wrote: > > Which version of Lucene are you using? > > > On Thu, Nov 12, 2015 at 11:39 AM, Valentin Popov > wrote: > >> Hello everyone. >> >> We have ~10 index

Re: 500 millions document for loop.

2015-11-12 Thread Anton Zenkov
Which version of Lucene are you using? On Thu, Nov 12, 2015 at 11:39 AM, Valentin Popov wrote: > Hello everyone. > > We have ~10 indexes for 500M documents, each document has «archive date», > and «to» address, one of our task is calculate statistics of «to» for last > year. Right now we are us

500 millions document for loop.

2015-11-12 Thread Valentin Popov
Hello everyone. We have ~10 indexes for 500M documents, each document has «archive date», and «to» address, one of our task is calculate statistics of «to» for last year. Right now we are using search archive_date:(current_date - 1 year) and paginate results for 50k records for page. Bottlenec

Re: debugging growing index size

2015-11-12 Thread Michael McCandless
Hi Rob, A couple more things: Can you print the value of MMapDirectory.UNMAP_SUPPORTED? Also, can you try your test using NIOFSDirectory instead? Curious if that changes things... Mike McCandless http://blog.mikemccandless.com On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde wrote: > Curiou

Re: debugging growing index size

2015-11-12 Thread Rob Audenaerde
Curious indeed! I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate the logs. Will get back with them in a day hopefully. Thanks for the extra logging! -Rob On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Hmm, curious. > > I looked at

Re: debugging growing index size

2015-11-12 Thread Michael McCandless
Hmm, curious. I looked at the [large] infoStream output and I see segment _3ou7 present on init of IW, a few getReader calls referencing it, then a forceMerge that indeed merges it away, yet I do NOT see IW attempting deletion of its files. And indeed I see plenty (too many: many times per second