Re: 500 millions document for loop.

2016-04-26 Thread Valentin Popov
H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Valentin Popov [mailto:valentin...@gmail.com] >> Sent: Saturday, November 14, 2015 1:51 PM >> To: java-user@lucene.apache.org >> Su

Re: 500 millions document for loop.

2016-04-21 Thread Sheng
If you don't care about search, why not just use reader to traverse ? Establish a for loop from 0 to reader.maxDoc() - 1, and filter the documents using Multifields. You can even bucket this procedure, and run your statistics calc in parallel. On Thursday, November 12, 2015, Valentin Popov wrote:

Re: 500 millions document for loop.

2016-04-21 Thread Erick Erickson
Actually, this looks like a fine place to use Streaming Aggregation/Streaming Expressions. Those operate off of docValues fields anyway, so you kind of get all this "for free". I don't see the Solr version though, much of this is in later 5x versions. Pull down the Solr Reference Guide for the ver

Re: 500 millions document for loop.

2016-04-21 Thread Valentin Popov
Chris , hello. Thank got the tip, but could you explain how can I use it? Regards, Valentin. > On 16 нояб. 2015 г., at 0:42, Chris Hostetter > wrote: > > > : public void collect(int docID) throws IOException { > : Document doc = indexSearcher

Re: 500 millions document for loop.

2015-11-15 Thread Chris Hostetter
: public void collect(int docID) throws IOException { : Document doc = indexSearcher.doc(docID, loadFields); : found.found(doc); : } Based on your description of the calculation you are doing

Re: 500 millions document for loop.

2015-11-14 Thread Valentin Popov
t; > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Valentin Popov [mailto:valentin...@gmail.com] >> Sent: Saturday, November 14, 2015 1:51 PM >> To: java-u

Re: 500 millions document for loop.

2015-11-14 Thread Valentin Popov
thetaphi.de > eMail: u...@thetaphi.de > >> -Original Message- >> From: Valentin Popov [mailto:valentin...@gmail.com] >> Sent: Saturday, November 14, 2015 1:51 PM >> To: java-user@lucene.apache.org >> Subject: Re: 500 millions document for loop. >> >>

RE: 500 millions document for loop.

2015-11-14 Thread Uwe Schindler
To: java-user@lucene.apache.org > Subject: Re: 500 millions document for loop. > > Thank you very much! > > > > On 14 нояб. 2015 г., at 15:49, Uwe Schindler wrote: > > > > Hi, > > > > This code is buggy! The collect() call of the collector does not get a &

Re: 500 millions document for loop.

2015-11-14 Thread Valentin Popov
---Original Message- >> From: Valentin Popov [mailto:valentin...@gmail.com] >> Sent: Saturday, November 14, 2015 1:04 PM >> To: java-user@lucene.apache.org >> Subject: Re: 500 millions document for loop. >> >> Hi, Uwe. >> >> Thanks for you advise. &

RE: 500 millions document for loop.

2015-11-14 Thread Uwe Schindler
il: u...@thetaphi.de > -Original Message- > From: Valentin Popov [mailto:valentin...@gmail.com] > Sent: Saturday, November 14, 2015 1:04 PM > To: java-user@lucene.apache.org > Subject: Re: 500 millions document for loop. > > Hi, Uwe. > > Thanks for you advise. > > Af

Re: 500 millions document for loop.

2015-11-14 Thread Valentin Popov
t;> eMail: u...@thetaphi.de >>> >>>> -Original Message- >>>> From: Valentin Popov [mailto:valentin...@gmail.com] >>>> Sent: Thursday, November 12, 2015 6:48 PM >>>> To: java-user@lucene.apache.org >>>> Subject: Re

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
8213 Bremen >>> http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>> >>>> -Original Message- >>>> From: Valentin Popov [mailto:valentin...@gmail.com] >>>> Sent: Thursday, November 12, 2015 6:48 PM >>>> To: java-

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
t;> Uwe >>> >>> - >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>> >>>> -Original Message- >>>> From: Valentin Popov [mailto:va

RE: 500 millions document for loop.

2015-11-12 Thread Uwe Schindler
llee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > >> -Original Message- > >> From: Valentin Popov [mailto:valentin...@gmail.com] > >> Sent: Thursday, November 12, 2015 6:48 PM > >> To: java-user@lucene.apac

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
Toke, I just look throw code we already using such method IndexSearcher indexSearcher = getIndexSearcher(searchResult); TopDocs topDocs; ScoreDoc currectScoreDoc = p.startScoreDoc; for (int page = 1;

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
entin Popov [mailto:valentin...@gmail.com] >> Sent: Thursday, November 12, 2015 6:48 PM >> To: java-user@lucene.apache.org >> Subject: Re: 500 millions document for loop. >> >> Toke, thanks! >> >> We will look at this solution, looks like this is that what

RE: 500 millions document for loop.

2015-11-12 Thread Uwe Schindler
...@thetaphi.de > -Original Message- > From: Valentin Popov [mailto:valentin...@gmail.com] > Sent: Thursday, November 12, 2015 6:48 PM > To: java-user@lucene.apache.org > Subject: Re: 500 millions document for loop. > > Toke, thanks! > > We will look at this soluti

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
Toke, thanks! We will look at this solution, looks like this is that what we need. > On 12 нояб. 2015 г., at 20:42, Toke Eskildsen > wrote: > > Valentin Popov wrote: > >> We have ~10 indexes for 500M documents, each document >> has «archive date», and «to» address, one of our task is >> c

Re: 500 millions document for loop.

2015-11-12 Thread Toke Eskildsen
Valentin Popov wrote: > We have ~10 indexes for 500M documents, each document > has «archive date», and «to» address, one of our task is > calculate statistics of «to» for last year. Right now we are > using search archive_date:(current_date - 1 year) and paginate > results for 50k records for pa

Re: 500 millions document for loop.

2015-11-12 Thread Valentin Popov
We are using 4.10.4 and it is not possible move right now to 5.x version. Thanks! > On 12 нояб. 2015 г., at 19:47, Anton Zenkov > wrote: > > Which version of Lucene are you using? > > > On Thu, Nov 12, 2015 at 11:39 AM, Valentin Popov > wrote: > >> Hello everyone. >> >> We have ~10 index

Re: 500 millions document for loop.

2015-11-12 Thread Anton Zenkov
Which version of Lucene are you using? On Thu, Nov 12, 2015 at 11:39 AM, Valentin Popov wrote: > Hello everyone. > > We have ~10 indexes for 500M documents, each document has «archive date», > and «to» address, one of our task is calculate statistics of «to» for last > year. Right now we are us