date:20151112

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov

Uwe, I’m very sorry, misprinted your name ;) > On 12 нояб. 2015 г., at 21:15, Uwe Schindler wrote: > > Hi, > >>> The big question is: Do you need the results paged at all? >> >> Yup, because if we return all results, we get OME. > > You get the OME because the paging collector cannot handle

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov

Hi, > On 12 нояб. 2015 г., at 21:15, Uwe Schindler wrote: > > Hi, > >>> The big question is: Do you need the results paged at all? >> >> Yup, because if we return all results, we get OME. > > You get the OME because the paging collector cannot handle that, so this is > an XY problem. Would i

RE: 500 millions document for loop.

2015-11-12 Thread Uwe Schindler

Hi, > > The big question is: Do you need the results paged at all? > > Yup, because if we return all results, we get OME. You get the OME because the paging collector cannot handle that, so this is an XY problem. Would it not be better if you application just gets the results as a stream and p

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov

Toke, I just look throw code we already using such method IndexSearcher indexSearcher = getIndexSearcher(searchResult); TopDocs topDocs; ScoreDoc currectScoreDoc = p.startScoreDoc; for (int page = 1;

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov

Hi, > > Hi, > > The big question is: Do you need the results paged at all? Yup, because if we return all results, we get OME. > Do you need them sorted? Nope. > If not, the easiest approach is to use a custom Collector that does no > sorting and just consumes the results. Main bottleneck

RE: 500 millions document for loop.

2015-11-12 Thread Uwe Schindler

Hi, The big question is: Do you need the results paged at all? Do you need them sorted? If not, the easiest approach is to use a custom Collector that does no sorting and just consumes the results. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov

Toke, thanks! We will look at this solution, looks like this is that what we need. > On 12 нояб. 2015 г., at 20:42, Toke Eskildsen > wrote: > > Valentin Popov wrote: > >> We have ~10 indexes for 500M documents, each document >> has «archive date», and «to» address, one of our task is >> c

Re: 500 millions document for loop.

2015-11-12 Thread Toke Eskildsen

Valentin Popov wrote: > We have ~10 indexes for 500M documents, each document > has «archive date», and «to» address, one of our task is > calculate statistics of «to» for last year. Right now we are > using search archive_date:(current_date - 1 year) and paginate > results for 50k records for pa

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov

We are using 4.10.4 and it is not possible move right now to 5.x version. Thanks! > On 12 нояб. 2015 г., at 19:47, Anton Zenkov > wrote: > > Which version of Lucene are you using? > > > On Thu, Nov 12, 2015 at 11:39 AM, Valentin Popov > wrote: > >> Hello everyone. >> >> We have ~10 index

Re: 500 millions document for loop.

2015-11-12 Thread Anton Zenkov

Which version of Lucene are you using? On Thu, Nov 12, 2015 at 11:39 AM, Valentin Popov wrote: > Hello everyone. > > We have ~10 indexes for 500M documents, each document has «archive date», > and «to» address, one of our task is calculate statistics of «to» for last > year. Right now we are us

500 millions document for loop.

2015-11-12 Thread Valentin Popov

Hello everyone. We have ~10 indexes for 500M documents, each document has «archive date», and «to» address, one of our task is calculate statistics of «to» for last year. Right now we are using search archive_date:(current_date - 1 year) and paginate results for 50k records for page. Bottlenec

Re: debugging growing index size

2015-11-12 Thread Michael McCandless

Hi Rob, A couple more things: Can you print the value of MMapDirectory.UNMAP_SUPPORTED? Also, can you try your test using NIOFSDirectory instead? Curious if that changes things... Mike McCandless http://blog.mikemccandless.com On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde wrote: > Curiou

Re: debugging growing index size

2015-11-12 Thread Rob Audenaerde

Curious indeed! I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate the logs. Will get back with them in a day hopefully. Thanks for the extra logging! -Rob On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Hmm, curious. > > I looked at

Re: debugging growing index size

2015-11-12 Thread Michael McCandless

Hmm, curious. I looked at the [large] infoStream output and I see segment _3ou7 present on init of IW, a few getReader calls referencing it, then a forceMerge that indeed merges it away, yet I do NOT see IW attempting deletion of its files. And indeed I see plenty (too many: many times per second

Re: 500 millions document for loop.

Re: 500 millions document for loop.

RE: 500 millions document for loop.

Re: 500 millions document for loop.

Re: 500 millions document for loop.

RE: 500 millions document for loop.

Re: 500 millions document for loop.

Re: 500 millions document for loop.

Re: 500 millions document for loop.

Re: 500 millions document for loop.

500 millions document for loop.

Re: debugging growing index size

Re: debugging growing index size

Re: debugging growing index size

14 matches

Site Navigation

Mail list logo

Footer information