Faceted Search with Parallel Indexes

2016-10-03 Thread george treacy
Hello, I am trying to do a faceted search across two parallel indexes with a ParallelCompositeReader. My problem is that I only get facet results from the first reader in the array of composite readers. This problem only occurs after upgrading to Lucene version 4_7_0+. If I switch the

Re: How to read multiple indices in parallel.

2015-04-07 Thread Gimantha Bandara
That was really helpful. Thanks a lot Terry! On Tue, Apr 7, 2015 at 8:17 PM, Terry Smith wrote: > Gimantha, > > Search will run in parallel even across indices. > > This happens because IndexSearcher searches by LeafReader and it doesn't > matter where thos

Re: How to read multiple indices in parallel.

2015-04-07 Thread Terry Smith
Gimantha, Search will run in parallel even across indices. This happens because IndexSearcher searches by LeafReader and it doesn't matter where those LeafReaders come from (DirectoryReader or MultiReader) they are all treated equally. Example: DirectoryReader(A): LeafReader(B), LeafR

Re: How to read multiple indices in parallel.

2015-04-07 Thread Gimantha Bandara
Hi Terry, I have multiple indices in separate locations. If I used multireader and used an executorservice with the indexSearcher It will go thru the segments in parallel and search right? But still searching between different indices will happen sequentially..Isnt it? On Tue, Apr 7, 2015 at 7

Re: How to read multiple indices in parallel.

2015-04-07 Thread Terry Smith
Gimantha, With Lucene 5.0 you can pass in an ExecutorService to the constructor of your IndexSearcher and it will search the segments in parallel if you use one of the IndexSearcher.search() methods that returns a TopDocs (and don't supply your own Collector). The not-yet-released Lucen

How to read multiple indices in parallel.

2015-04-07 Thread Gimantha Bandara
Hi all, As I can see the Multireader is reading the multiple indices sequentially (correct me if I am wrong). So using a IndexSearcher on a multireader will also perform sequential searches right? Is there a lucene-built-in class to search several indices parallely? -- Gimantha Bandara Software

Exception while using a custom analyzer in a parallel indexing!

2014-09-15 Thread andi rexha
Hi, I have an index writer that is used to from a pool of threads to index. The index writer is using a "PerFieldAnalyzerWrapper": this.analyzer = new PerFieldAnalyzerWrapper(DEFAULT_ANALYZER, fields); If I add the documents single threaded I dont get any exception. In the case that I add th

Parallel searching use ExecutorService and Collectors

2012-05-08 Thread Christoph Kaser
Hi all, I want to speed up my searches by using multiple CPU cores for one search. I saw that there is a possibility to use multithreaded search by passing an ExecutorService to the IndexSearcher: idxSearcher = new IndexSearcher(reader, Executors.newCachedThreadPool()); I call my searc

Re: Counting all the hits with parallel searching

2012-02-19 Thread Robert Muir
On Sun, Feb 19, 2012 at 10:23 AM, Benson Margulies wrote: > thanks, that's what I needed. > Thanks for bringing this up, I think its a common issue, I created https://issues.apache.org/jira/browse/LUCENE-3799 to hopefully improve the docs situation. -- lucidimagination.com

Re: Counting all the hits with parallel searching

2012-02-19 Thread Benson Margulies
iant heaps. Is >> there another way to express this? Should I file a JIRA that the >> parallel code should have some graceful behavior? >> >> int longestMentionFreq = searcher.search(longestMentionQuery, filter, >> Integer.MAX_VALUE).totalHits + 1; >> > > the

RE: Counting all the hits with parallel searching

2012-02-19 Thread Uwe Schindler
Original Message- > From: Benson Margulies [mailto:bimargul...@gmail.com] > Sent: Sunday, February 19, 2012 3:22 PM > To: java-user@lucene.apache.org > Subject: Counting all the hits with parallel searching > > If I have a lot of segments, and an executor service in my searcher, t

Re: Counting all the hits with parallel searching

2012-02-19 Thread Robert Muir
On Sun, Feb 19, 2012 at 9:21 AM, Benson Margulies wrote: > If I have a lot of segments, and an executor service in my searcher, > the following runs out of memory instantly, building giant heaps. Is > there another way to express this? Should I file a JIRA that the > parallel code

Counting all the hits with parallel searching

2012-02-19 Thread Benson Margulies
If I have a lot of segments, and an executor service in my searcher, the following runs out of memory instantly, building giant heaps. Is there another way to express this? Should I file a JIRA that the parallel code should have some graceful behavior? int longestMentionFreq = searcher.search

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-27 Thread Erick Erickson
I'll take your word for it, though it seems odd. I'm wondering if there's anything you can do to pre-process the documents at index time to make the post-processing less painful, but that's a wild shot in the dark... Another possibility would be to fetch only the fields you need to do the post-pro

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-27 Thread Robert Bart
Erick, Thanks for your reply! You are probably right to question how many Documents we are retrieving. We know it isn't best, but significantly reducing that number will require us to completely rebuild our system. Before we do that, we were just wondering if there was anything in the Lucene API o

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-22 Thread Erick Erickson
I call into question why you "retrieve and materialize as many as 3,000 Documents from each index in order to display a page of results to the user". You have to be doing some post-processing because displaying 12,000 documents to the user is completely useless. I wonder if this is an "XY" problem

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-22 Thread Lance Norskog
Is each index optimized? >From my vague grasp of Lucene file formats, I think you want to sort the documents by segment document id, which is the order of documents on the disk. This lets you materialize documents in their order on the disk. Solr (and other apps) generally use a separate thread p

Re: Retrieving large numbers of documents from several disks in parallel

2011-12-21 Thread Paul Libbrecht
Michael, from a physical point of view, it would seem like the order in which the documents are read is very significant for the reading speed (feel the random access jump as being the issue). You could: - move to ram-disk or ssd to make a difference? - use something different than a searcher w

Retrieving large numbers of documents from several disks in parallel

2011-12-21 Thread Robert Bart
Hi All, I am running Lucene 3.4 in an application that indexes about 1 billion factual assertions (Documents) from the web over four separate disks, so that each disk has a separate index of about 250 million documents. The Documents are relatively small, less than 1KB each. These indexes provide

Re: Parallel access to TermPositions API

2011-04-15 Thread Federico Fissore
Chris Bamford, il 14/04/2011 20:11, ha scritto: Hi, I need to load a huge amount of TermPositions in a short space of time (millions of Documents, sub-second). Does the IndexReader's API support multiple accesses to allow several parallel threads to consume a chunk each? AFAIK, you c

Parallel access to TermPositions API

2011-04-15 Thread Chris Bamford
Hi, I need to load a huge amount of TermPositions in a short space of time (millions of Documents, sub-second). Does the IndexReader's API support multiple accesses to allow several parallel threads to consume a chunk each? Thanks for any ideas / pointers. -

Re: merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-04 Thread Britske
Yeah excellent! This should indeed work! Thanks, Geert-Jan Jérôme Thièvre wrote: > > Hello Geert-Jan, > > it's possible to merge several parallel physical indexes (viewed as one > logical index with a ParallelReader). > Just use the method IndexWriter.addIndexe

Re: merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-04 Thread Britske
both at the sime time. This is >> what my running index looks like. >> >> However at certain points I was considering to store a  frozen index from >> the parallel index for backup/ other purposes. I figured having it merged >> would shave of some complexity. >> >&

Re: merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-04 Thread Jérôme Thièvre
Hello Geert-Jan, it's possible to merge several parallel physical indexes (viewed as one logical index with a ParallelReader). Just use the method IndexWriter.addIndexes(IndexReader[] readers): IndexReader[] physicalReaders = ...; // Your readers here IndexWriter iw = new IndexW

Re: merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-04 Thread Michael McCandless
the indexes are in sync. So I could > (and do) use parallelReader to search them both at the sime time. This is > what my running index looks like. > > However at certain points I was considering to store a  frozen index from > the parallel index for backup/ other purposes. I figured

Re: merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-04 Thread Britske
Thanks, but it's already guaranteed that the indexes are in sync. So I could (and do) use parallelReader to search them both at the sime time. This is what my running index looks like. However at certain points I was considering to store a frozen index from the parallel index for backup/

Re: merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-03 Thread Michael McCandless
addIndexesNoOptimize is only for shards. But this [pending patch/contribution] is similar what you're seeking, I think: https://issues.apache.org/jira/browse/LUCENE-1879 It does not actually merge the indexes, but rather keeps 2 parallel indexes in sync so you can use ParallelReader to s

merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-03 Thread Britske
Given two parallel indexes which contain the same products but different fields, one with slowly changing fields and one with fields which are updated regularly: Is it possible to periodically merge these to form a single index? (thereby representing a frozen snapshot in time) For example

merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-03 Thread Britske
Given two parallel indexes which contain the same products but different fields, one with slowly changing fields and one with fields which are updated regularly: Is it possible to periodically merge these to form a single index? (thereby representing a frozen snapshot in time) For example

merging Parallel indexes (can indexWriter.addIndexesNoOptimize be used?)

2009-11-03 Thread Britske
Given two parallel indexes one with slowly changing fields and one with fields which are updated regularly. Is it possible to periodically merge these to form a single index? (thereby representing a frozen snapshot in time) For example: Can indexWriter.addIndexesNoOptimize handle this, or was

Re: Question: Can lucene do parallel indexing?

2008-06-27 Thread Phil Myers
tp://wiki.apache.org/lucene-java/ImproveIndexingSpeed -Phil --- On Fri, 6/27/08, David Lee <[EMAIL PROTECTED]> wrote: > From: David Lee <[EMAIL PROTECTED]> > Subject: Question: Can lucene do parallel indexing? > To: java-user@lucene.apache.org > Date: Friday, June 27, 2008

Question: Can lucene do parallel indexing?

2008-06-27 Thread David Lee
If I'm using a computer that has multiple cores, or if I want to use several computers to speed up the indexing process, how should I do that? Is there some kind of support for that in the API? David Lee

Re: Rebuilding parallel indexes

2008-06-09 Thread Antony Bowesman
would need recreation (I'm assuming the optimization would muck up the Ids if only the parallel index was optimized). You'd also need to get the new doc Id for each doc that is added. Are docIds allocated during addDocument or during the c

Re: Rebuilding parallel indexes

2008-06-09 Thread Andrzej Bialecki
Antony Bowesman wrote: I have a design where I will be using multiple index shards to hold approx 7.5 million documents per index per month over many years. These will be large static R/O indexes but the corresponding smaller parallel index will get many frequent changes. I understand from

Rebuilding parallel indexes

2008-06-09 Thread Antony Bowesman
I have a design where I will be using multiple index shards to hold approx 7.5 million documents per index per month over many years. These will be large static R/O indexes but the corresponding smaller parallel index will get many frequent changes. I understand from previous replies by Hoss

FYI: parallel corpus in 22 languages

2008-01-24 Thread Andrzej Bialecki
Hi all, Just FYI, perhaps this is old news for you ... This large corpus is freely available and it is pairwise sentence-aligned for all language combinations. This looks like a good resource for linguistic information, such as frequent words and phrases, n-gram profiles, etc. http://wt.jrc.

Error with Remote Parallel MultiSearching

2007-12-17 Thread reeja devadas
Hi, We are working with a web server and 10 search servers, these 10 servers have index fragments on it. All available fragments of these search servers are binding at their start up time. Remote Parallel MultiSearcher is used for searching on these indices. When a search request comes, first it

Re: contrib/benchmark Parallel tasks ?

2007-10-18 Thread Doron Cohen
Hi Grant, Grant Ingersoll wrote: > I think the answer is: > [{ "MAddDocs" AddDoc } : 5000] : 4 > > Is this the functional equivalent of doing: > { "MAddDocs" AddDoc } : 2 > > in parallel? Yes, this is correct, it reads as "create 4 threads, eac

Re: contrib/benchmark Parallel tasks ?

2007-10-17 Thread Grant Ingersoll
I think the answer is: [{ "MAddDocs" AddDoc } : 5000] : 4 Is this the functional equivalent of doing: { "MAddDocs" AddDoc } : 2 in parallel? Thanks, Grant On Oct 17, 2007, at 10:42 AM, Grant Ingersoll wrote: Hi, I am using the contrib/benchmarker to do some performa

contrib/benchmark Parallel tasks ?

2007-10-17 Thread Grant Ingersoll
factor documentation in the docs given by the URL above, for instance, it says: "Example - [ AddDoc ] : 400 : 3 - would do 400 addDoc in parallel, starting up to 3 threads per second. " but, I think I want instead: start up 4 threads, and then have each split up the indexing of

Re: Almost parallel indexes

2007-09-28 Thread Chris Hostetter
: I can't really use ParallelReader to keep the indexes the same; it : requires me to add documents to both indexes which means I have to : retokenize the large fields anyway. I would want to do a "join" on an : external id, and as far as I can tell, Lucene doesn't support that. correction: it

Re: Almost parallel indexes

2007-09-28 Thread Nixon
- Original Message - From: "Erick Erickson" <[EMAIL PROTECTED]> To: Sent: Friday, September 28, 2007 5:43 AM Subject: Re: Almost parallel indexes OK, this isn't well thought out, more the first thing that pops to mind... You're right, Lucene doesn't

Re: Almost parallel indexes

2007-09-27 Thread Erick Erickson
OK, this isn't well thought out, more the first thing that pops to mind... You're right, Lucene doesn't do joins. But would it serve to keep two indexes? One the slow-changing stuff and one the fast-changing stuff. They are related by some *external* (as in "not the Lucene doc id) field. You'd h

Almost parallel indexes

2007-09-27 Thread Tim Sturge
Hi, I have an index which contains two very distinct types of fields: - Some fields are large (many term documents) and change fairly slowly. - Some fields are small (mostly titles, names, anchor text) and change fairly rapidly. Right now I keep around the large fields in raw form and when the

LUCENE-423: thread pool implementation of parallel queries

2007-08-15 Thread Renaud Waldura
Could someone who understands Lucene internals help me port https://issues.apache.org/jira/browse/LUCENE-423 to Lucene 2.0? I have beefy hardware (32 cores) and want to try this out, but it won't compile. There are 2 issues: 1- maxScore On line 412 TopFieldDocs constructor now needs a maxScore.

Re: Parallel Index Search

2006-10-16 Thread Michael McCandless
? So in this case I should say index accessed one by one not parallel? The commit lock is only held while a reader is loading the index and while a writer is "committing" its changes to the index. These times should be brief. Whereas, the write lock is held for the entire time that a

Re: Parallel Index Search

2006-10-16 Thread Supriya Kumar Shyamal
release the lock, is it right? So in this case I should say index accessed one by one not parallel? The commit lock is only held while a reader is loading the index and while a writer is "committing" its changes to the index. These times should be brief. Whereas, the write lock is he

Re: Parallel Index Search

2006-10-16 Thread Michael McCandless
? So in this case I should say index accessed one by one not parallel? The commit lock is only held while a reader is loading the index and while a writer is "committing" its changes to the index. These times should be brief. Whereas, the write lock is held for the entire time that a

Parallel Index Search

2006-10-16 Thread Supriya Kumar Shyamal
other thread waits until the previosu therads release the lock, is it right? So in this case I should say index accessed one by one not parallel? Its just my speculation, please don't get me wrong. Because I try to share the same index by 6 instances and since the lock for 5 instances are dis

Parallel

2006-05-09 Thread [EMAIL PROTECTED]
Hi, first sorry if this may be a stupid question... :-) I've 3 separate index and i use a ParallelMultiSearcher to search in... now i would like to limits the number of hits founded ... for example i would like to get the first 10 hits from each indexes. How can i do this? Any suggestions? Thanks

Remote Parallel MultiSearcher

2006-04-18 Thread Sunil Kumar PK
Hi All, What I have understood from Lucene Remote Parallel Multi Searcher Search Procedure is first compute the weight for the Query in each Index sequentially (one by one, eg: - calculate "query weight" of index1 first and then index2) and then perform searching of each index one

Parallel MultiSearcher

2006-03-29 Thread pksunilpk
What I have understood from Lucene Remote Parallel Multi Searcher Search Procedure is first compute the weight for the Query in each Index sequentially (one by one, eg: - calculate "query weight" of index1 first and then index2) and then perform searching of each index one by one and

Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-22 Thread Nadav Har'El
Chris Hostetter <[EMAIL PROTECTED]> wrote on 22/02/2006 03:24:58 AM: > > : It would have been nice if someone wrote something like indexModifier, > : but with a cache, similar to what Yonik suggested above: deletions will > : not be done immediately, but rather cached and later done in batches. > :

Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Chris Hostetter
: It would have been nice if someone wrote something like indexModifier, : but with a cache, similar to what Yonik suggested above: deletions will : not be done immediately, but rather cached and later done in batches. : Of course, batched deletions should not remember the term to delete, : but ra

Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Paul . Illingworth
java-user@lucene.apache.org cc Please respond to Subject [EMAIL PROTECTED] Re:

Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Nadav Har'El
"Yonik Seeley" <[EMAIL PROTECTED]> wrote on 21/02/2006 05:13:52 PM: > On 2/21/06, Pierre Luc Dupont <[EMAIL PROTECTED]> wrote: > > is it possible to open an IndexWriter and an IndexReader on the same > > index, at the same time, > > to do deleteTerm and addDocument? > > No, it's not possible.

RE: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Pierre Luc Dupont
Ok, thanks. That is what I was thinking. Pierre-Luc -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: 2006-02-21 10:14 To: java-user@lucene.apache.org Subject: Re: Open an IndexWriter in parallel with an IndexReader on the same index. On 2/21/06, Pierre Luc

Re: Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Yonik Seeley
On 2/21/06, Pierre Luc Dupont <[EMAIL PROTECTED]> wrote: > is it possible to open an IndexWriter and an IndexReader on the same > index, at the same time, > to do deleteTerm and addDocument? No, it's not possible. You should batch things: do all your deletions, close the IndexReader, then ope

Open an IndexWriter in parallel with an IndexReader on the same index.

2006-02-21 Thread Pierre Luc Dupont
Hi, is it possible to open an IndexWriter and an IndexReader on the same index, at the same time, to do deleteTerm and addDocument? Thanks! Pierre-Luc