Hello,
I am trying to do a faceted search across two parallel indexes with a
ParallelCompositeReader.
My problem is that I only get facet results from the first reader in the
array of composite readers. This problem only occurs after upgrading to
Lucene version 4_7_0+.
If I switch the
That was really helpful. Thanks a lot Terry!
On Tue, Apr 7, 2015 at 8:17 PM, Terry Smith wrote:
> Gimantha,
>
> Search will run in parallel even across indices.
>
> This happens because IndexSearcher searches by LeafReader and it doesn't
> matter where thos
Gimantha,
Search will run in parallel even across indices.
This happens because IndexSearcher searches by LeafReader and it doesn't
matter where those LeafReaders come from (DirectoryReader or MultiReader)
they are all treated equally.
Example:
DirectoryReader(A):
LeafReader(B), LeafR
Hi Terry,
I have multiple indices in separate locations. If I used multireader and
used an executorservice with the indexSearcher It will go thru the segments
in parallel and search right? But still searching between different indices
will happen sequentially..Isnt it?
On Tue, Apr 7, 2015 at 7
Gimantha,
With Lucene 5.0 you can pass in an ExecutorService to the constructor of
your IndexSearcher and it will search the segments in parallel if you use
one of the IndexSearcher.search() methods that returns a TopDocs (and don't
supply your own Collector).
The not-yet-released Lucen
Hi all,
As I can see the Multireader is reading the multiple indices sequentially
(correct me if I am wrong). So using a IndexSearcher on a multireader will
also perform sequential searches right? Is there a lucene-built-in class to
search several indices parallely?
--
Gimantha Bandara
Software
Hi,
I have an index writer that is used to from a pool of threads to index. The
index writer is using a "PerFieldAnalyzerWrapper":
this.analyzer = new PerFieldAnalyzerWrapper(DEFAULT_ANALYZER, fields);
If I add the documents single threaded I dont get any exception. In the case
that I add th
Hi all,
I want to speed up my searches by using multiple CPU cores for one
search. I saw that there is a possibility to use multithreaded search by
passing an ExecutorService to the IndexSearcher:
idxSearcher = new IndexSearcher(reader,
Executors.newCachedThreadPool());
I call my searc
On Sun, Feb 19, 2012 at 10:23 AM, Benson Margulies
wrote:
> thanks, that's what I needed.
>
Thanks for bringing this up, I think its a common issue, I created
https://issues.apache.org/jira/browse/LUCENE-3799 to hopefully improve
the docs situation.
--
lucidimagination.com
iant heaps. Is
>> there another way to express this? Should I file a JIRA that the
>> parallel code should have some graceful behavior?
>>
>> int longestMentionFreq = searcher.search(longestMentionQuery, filter,
>> Integer.MAX_VALUE).totalHits + 1;
>>
>
> the
Original Message-
> From: Benson Margulies [mailto:bimargul...@gmail.com]
> Sent: Sunday, February 19, 2012 3:22 PM
> To: java-user@lucene.apache.org
> Subject: Counting all the hits with parallel searching
>
> If I have a lot of segments, and an executor service in my searcher, t
On Sun, Feb 19, 2012 at 9:21 AM, Benson Margulies wrote:
> If I have a lot of segments, and an executor service in my searcher,
> the following runs out of memory instantly, building giant heaps. Is
> there another way to express this? Should I file a JIRA that the
> parallel code
If I have a lot of segments, and an executor service in my searcher,
the following runs out of memory instantly, building giant heaps. Is
there another way to express this? Should I file a JIRA that the
parallel code should have some graceful behavior?
int longestMentionFreq = searcher.search
I'll take your word for it, though it seems odd. I'm wondering
if there's anything you can do to pre-process the documents
at index time to make the post-processing less painful, but
that's a wild shot in the dark...
Another possibility would be to fetch only the fields you need
to do the post-pro
Erick,
Thanks for your reply! You are probably right to question how many
Documents we are retrieving. We know it isn't best, but significantly
reducing that number will require us to completely rebuild our system.
Before we do that, we were just wondering if there was anything in the
Lucene API o
I call into question why you "retrieve and materialize as
many as 3,000 Documents from each index in order to
display a page of results to the user". You have to be
doing some post-processing because displaying
12,000 documents to the user is completely useless.
I wonder if this is an "XY" problem
Is each index optimized?
>From my vague grasp of Lucene file formats, I think you want to sort
the documents by segment document id, which is the order of documents
on the disk. This lets you materialize documents in their order on the
disk.
Solr (and other apps) generally use a separate thread p
Michael,
from a physical point of view, it would seem like the order in which the
documents are read is very significant for the reading speed (feel the random
access jump as being the issue).
You could:
- move to ram-disk or ssd to make a difference?
- use something different than a searcher w
Hi All,
I am running Lucene 3.4 in an application that indexes about 1 billion
factual assertions (Documents) from the web over four separate disks, so
that each disk has a separate index of about 250 million documents. The
Documents are relatively small, less than 1KB each. These indexes provide
Chris Bamford, il 14/04/2011 20:11, ha scritto:
Hi,
I need to load a huge amount of TermPositions in a short space of time
(millions of Documents, sub-second).
Does the IndexReader's API support multiple accesses to allow several
parallel threads to consume a chunk each?
AFAIK, you c
Hi,
I need to load a huge amount of TermPositions in a short space of time
(millions of Documents, sub-second).
Does the IndexReader's API support multiple accesses to allow several
parallel threads to consume a chunk each?
Thanks for any ideas / pointers.
-
Yeah excellent! This should indeed work!
Thanks,
Geert-Jan
Jérôme Thièvre wrote:
>
> Hello Geert-Jan,
>
> it's possible to merge several parallel physical indexes (viewed as one
> logical index with a ParallelReader).
> Just use the method IndexWriter.addIndexe
both at the sime time. This is
>> what my running index looks like.
>>
>> However at certain points I was considering to store a frozen index from
>> the parallel index for backup/ other purposes. I figured having it merged
>> would shave of some complexity.
>>
>&
Hello Geert-Jan,
it's possible to merge several parallel physical indexes (viewed as one
logical index with a ParallelReader).
Just use the method IndexWriter.addIndexes(IndexReader[] readers):
IndexReader[] physicalReaders = ...; // Your readers here
IndexWriter iw = new IndexW
the indexes are in sync. So I could
> (and do) use parallelReader to search them both at the sime time. This is
> what my running index looks like.
>
> However at certain points I was considering to store a frozen index from
> the parallel index for backup/ other purposes. I figured
Thanks, but it's already guaranteed that the indexes are in sync. So I could
(and do) use parallelReader to search them both at the sime time. This is
what my running index looks like.
However at certain points I was considering to store a frozen index from
the parallel index for backup/
addIndexesNoOptimize is only for shards.
But this [pending patch/contribution] is similar what you're seeking, I think:
https://issues.apache.org/jira/browse/LUCENE-1879
It does not actually merge the indexes, but rather keeps 2 parallel
indexes in sync so you can use ParallelReader to s
Given two parallel indexes which contain the same products but different
fields, one with slowly changing fields and one with fields which are
updated regularly:
Is it possible to periodically merge these to form a single index? (thereby
representing a frozen snapshot in time)
For example
Given two parallel indexes which contain the same products but different
fields, one with slowly changing fields and one with fields which are
updated regularly:
Is it possible to periodically merge these to form a single index? (thereby
representing a frozen snapshot in time)
For example
Given two parallel indexes one with slowly changing fields and one with
fields which are updated regularly.
Is it possible to periodically merge these to form a single index? (thereby
representing a frozen snapshot in time)
For example: Can indexWriter.addIndexesNoOptimize handle this, or was
tp://wiki.apache.org/lucene-java/ImproveIndexingSpeed
-Phil
--- On Fri, 6/27/08, David Lee <[EMAIL PROTECTED]> wrote:
> From: David Lee <[EMAIL PROTECTED]>
> Subject: Question: Can lucene do parallel indexing?
> To: java-user@lucene.apache.org
> Date: Friday, June 27, 2008
If I'm using a computer that has multiple cores, or if I want to use several
computers to speed up the indexing process, how should I do that? Is there
some kind of support for that in the API?
David Lee
would need recreation (I'm assuming the
optimization would muck up the Ids if only the parallel index was optimized).
You'd also need to get the new doc Id for each doc that is added. Are docIds
allocated during addDocument or during the c
Antony Bowesman wrote:
I have a design where I will be using multiple index shards to hold
approx 7.5 million documents per index per month over many years. These
will be large static R/O indexes but the corresponding smaller parallel
index will get many frequent changes.
I understand from
I have a design where I will be using multiple index shards to hold approx 7.5
million documents per index per month over many years. These will be large
static R/O indexes but the corresponding smaller parallel index will get many
frequent changes.
I understand from previous replies by Hoss
Hi all,
Just FYI, perhaps this is old news for you ... This large corpus is
freely available and it is pairwise sentence-aligned for all language
combinations. This looks like a good resource for linguistic
information, such as frequent words and phrases, n-gram profiles, etc.
http://wt.jrc.
Hi,
We are working with a web server and 10 search servers, these 10 servers
have index fragments on it. All available fragments of these search servers
are binding at their start up time. Remote Parallel MultiSearcher is used
for searching on these indices. When a search request comes, first it
Hi Grant,
Grant Ingersoll wrote:
> I think the answer is:
> [{ "MAddDocs" AddDoc } : 5000] : 4
>
> Is this the functional equivalent of doing:
> { "MAddDocs" AddDoc } : 2
>
> in parallel?
Yes, this is correct, it reads as "create 4 threads, eac
I think the answer is:
[{ "MAddDocs" AddDoc } : 5000] : 4
Is this the functional equivalent of doing:
{ "MAddDocs" AddDoc } : 2
in parallel?
Thanks,
Grant
On Oct 17, 2007, at 10:42 AM, Grant Ingersoll wrote:
Hi,
I am using the contrib/benchmarker to do some performa
factor documentation in the
docs given by the URL above, for instance, it says:
"Example - [ AddDoc ] : 400 : 3 - would do 400 addDoc in parallel,
starting up to 3 threads per second. "
but, I think I want instead: start up 4 threads, and then have each
split up the indexing of
: I can't really use ParallelReader to keep the indexes the same; it
: requires me to add documents to both indexes which means I have to
: retokenize the large fields anyway. I would want to do a "join" on an
: external id, and as far as I can tell, Lucene doesn't support that.
correction: it
- Original Message -
From: "Erick Erickson" <[EMAIL PROTECTED]>
To:
Sent: Friday, September 28, 2007 5:43 AM
Subject: Re: Almost parallel indexes
OK, this isn't well thought out, more the first thing that
pops to mind...
You're right, Lucene doesn't
OK, this isn't well thought out, more the first thing that
pops to mind...
You're right, Lucene doesn't do joins. But would it serve
to keep two indexes? One the slow-changing stuff
and one the fast-changing stuff. They are related by
some *external* (as in "not the Lucene doc id)
field.
You'd h
Hi,
I have an index which contains two very distinct types of fields:
- Some fields are large (many term documents) and change fairly slowly.
- Some fields are small (mostly titles, names, anchor text) and change fairly
rapidly.
Right now I keep around the large fields in raw form and when the
Could someone who understands Lucene internals help me port
https://issues.apache.org/jira/browse/LUCENE-423 to Lucene 2.0? I have beefy
hardware (32 cores) and want to try this out, but it won't compile.
There are 2 issues:
1- maxScore
On line 412 TopFieldDocs constructor now needs a maxScore.
?
So in this case I should say index accessed one by one not parallel?
The commit lock is only held while a reader is loading the index and
while a writer is "committing" its changes to the index. These times
should be brief. Whereas, the write lock is held for the entire time
that a
release the lock, is it right?
So in this case I should say index accessed one by one not parallel?
The commit lock is only held while a reader is loading the index and
while a writer is "committing" its changes to the index. These times
should be brief. Whereas, the write lock is he
?
So in this case I should say index accessed one by one not parallel?
The commit lock is only held while a reader is loading the index and
while a writer is "committing" its changes to the index. These times
should be brief. Whereas, the write lock is held for the entire time
that a
other thread waits until the
previosu therads release the lock, is it right?
So in this case I should say index accessed one by one not parallel?
Its just my speculation, please don't get me wrong.
Because I try to share the same index by 6 instances and since the lock
for 5 instances are dis
Hi,
first sorry if this may be a stupid question... :-)
I've 3 separate index and i use a ParallelMultiSearcher to search in... now i
would like to limits the number of hits founded ... for example i would like to
get the first 10 hits from each indexes.
How can i do this? Any suggestions?
Thanks
Hi All,
What I have understood from Lucene Remote Parallel Multi Searcher Search
Procedure is first compute the weight for the Query in each Index
sequentially (one by one, eg: - calculate "query weight" of index1 first and
then index2) and then perform searching of each index one
What I have understood from Lucene Remote Parallel Multi Searcher Search
Procedure is first compute the weight for the Query in each Index
sequentially (one by one, eg: - calculate "query weight" of index1 first and
then index2) and then perform searching of each index one by one and
Chris Hostetter <[EMAIL PROTECTED]> wrote on 22/02/2006 03:24:58 AM:
>
> : It would have been nice if someone wrote something like indexModifier,
> : but with a cache, similar to what Yonik suggested above: deletions will
> : not be done immediately, but rather cached and later done in batches.
> :
: It would have been nice if someone wrote something like indexModifier,
: but with a cache, similar to what Yonik suggested above: deletions will
: not be done immediately, but rather cached and later done in batches.
: Of course, batched deletions should not remember the term to delete,
: but ra
java-user@lucene.apache.org
cc
Please respond to Subject
[EMAIL PROTECTED] Re:
"Yonik Seeley" <[EMAIL PROTECTED]> wrote on 21/02/2006 05:13:52 PM:
> On 2/21/06, Pierre Luc Dupont <[EMAIL PROTECTED]> wrote:
> > is it possible to open an IndexWriter and an IndexReader on the
same
> > index, at the same time,
> > to do deleteTerm and addDocument?
>
> No, it's not possible.
Ok, thanks.
That is what I was thinking.
Pierre-Luc
-Original Message-
From: Yonik Seeley [mailto:[EMAIL PROTECTED]
Sent: 2006-02-21 10:14
To: java-user@lucene.apache.org
Subject: Re: Open an IndexWriter in parallel with an IndexReader on the
same index.
On 2/21/06, Pierre Luc
On 2/21/06, Pierre Luc Dupont <[EMAIL PROTECTED]> wrote:
> is it possible to open an IndexWriter and an IndexReader on the same
> index, at the same time,
> to do deleteTerm and addDocument?
No, it's not possible. You should batch things: do all your
deletions, close the IndexReader, then ope
Hi,
is it possible to open an IndexWriter and an IndexReader on the same
index, at the same time,
to do deleteTerm and addDocument?
Thanks!
Pierre-Luc
59 matches
Mail list logo