Re: Query to find documents whihc contain the same value for a field, i.e duplicate fields

2011-12-22 Thread Paul Taylor
On 20/12/2011 19:38, Paul Taylor wrote: So I had this code, that would return all documents where there was more than one document that had the same value for fieldname. Trouble is I didn't realise this could return documents that had been deleted, so Im wondering what an equivalent using queri

Re: luke and chinese text

2011-12-22 Thread Andrzej Bialecki
On 22/12/2011 13:50, Peyman Faratin wrote: Hi We are indexing some chinese text (using the following outputstreamwriter with UTF-8 enconding). OutputStreamWriter outputFileWriter = new OutputStreamWriter(new FileOutputStream(outputFile), "utf8"); using lucene 3.2. The analyzer is new Limit

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-22 Thread Erick Erickson
I call into question why you "retrieve and materialize as many as 3,000 Documents from each index in order to display a page of results to the user". You have to be doing some post-processing because displaying 12,000 documents to the user is completely useless. I wonder if this is an "XY" problem

luke and chinese text

2011-12-22 Thread Peyman Faratin
Hi We are indexing some chinese text (using the following outputstreamwriter with UTF-8 enconding). OutputStreamWriter outputFileWriter = new OutputStreamWriter(new FileOutputStream(outputFile), "utf8"); using lucene 3.2. The analyzer is new LimitTokenCountAnalyzer(new SmartChineseAnalyze

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-22 Thread Lance Norskog
Is each index optimized? >From my vague grasp of Lucene file formats, I think you want to sort the documents by segment document id, which is the order of documents on the disk. This lets you materialize documents in their order on the disk. Solr (and other apps) generally use a separate thread p