On Jun 10, 2009, at 5:02 PM, Yonik Seeley wrote:
On Wed, Jun 10, 2009 at 7:58 PM, Daniel Noll wrote:
It's a shame we don't have an inverted kind of HitCollector where we
can say "give me the next hit", so that we can get the best of both
worlds (like what StAX gives us in the XML world.)
You
On Fri, Jun 5, 2009 at 21:31, Abhi wrote:
> Say I have indexed the following strings:
>
> 1. "cool gaming laptop"
> 2. "cool gaming lappy"
> 3. "gaming laptop cool"
>
> Now when I search with a query say "cool gaming computer", I want string 1
> and 2 to appear on top (where search terms are closer
On Wed, Jun 10, 2009 at 7:58 PM, Daniel Noll wrote:
> It's a shame we don't have an inverted kind of HitCollector where we
> can say "give me the next hit", so that we can get the best of both
> worlds (like what StAX gives us in the XML world.)
You can get a scorer and call next() yourself.
-Yo
On Wed, Jun 10, 2009 at 20:17, Uwe Schindler wrote:
> You are right, you can, but if you just want to retrieve all hits, this is
> ineffective. A HitCollector is the correct way to do this (especially
> because the order of hits is mostly not interesting when retrieving all
> hits). Hits and TopDoc
On Jun 10, 2009, at 10:49 AM, Uwe Schindler wrote:
To optimize, store the filename not as stored field, but as a non-
tokenized,
indexed term.
How do you do that?
- Paul
-
To unsubscribe, e-mail: java-user-unsubscr...@lucen
Great! If I understand correctly it looks like RAM savings? Will
there be an improvement in lookup speed? (We're using binary
search here?).
Is there a precedence in database systems for what was mentioned
about placing the term dict, delDocs, and filters onto disk and
reading them from there (wit
So... how about we try to wrap up 2.9/3.0 and ship with what we have,
now? It's been 8 months since 2.4.0 was released, and 2.9's got plenty
of new stuff, and we are all itching to remove these deprecated APIs,
switch to Java 1.5, etc.
We should try to finish the issues that are open and under
Roughly, the current approach for the default terms dict codec in
LUCENE-1458 is:
* Create a separate class per-field (the String field in each Term
is redundant). This is a big change over Lucene today
* That class has String[] indexText and long[] indexPointer, each
length = th
> LUCENE-1458 (flexible indexing) has these improvements,
Mike, can you explain how it's different? I looked through the code once
but yeah, it's in with a lot of other changes.
On Wed, Jun 10, 2009 at 5:40 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> This (very large number of
Asking for top 100K docs will certainly consume more RAM than asking
for top 2, but much less than 1 GB.
More like maybe an added ~2-3 MB or so.
Mike
On Wed, Jun 10, 2009 at 1:30 PM, Zhang,
Lisheng wrote:
> Hi,
>
> Does this issue has anything to do with the line:
>
>> TopScoreDocCollector colle
That looks good, but contains the inner search loop (looking up the stored
fields from within the main search loop, which is the hit collector). For
few results this is ok, but if you are collecting thousands of hits from a
very large index that does not fit into memory, the collect gets slow
becau
Hi,
Does this issue has anything to do with the line:
> TopScoreDocCollector collector = new TopScoreDocCollector(10);
if we do:
> TopScoreDocCollector collector = new TopScoreDocCollector(2);
instead (only see top two documents), could memory usage be less?
Best regards, Lisheng
-Or
Another potential idea would be to break up the index into N indices
such that each index is small enough to fit two in memory and then you
can swap them.
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/index/MultiReader.html
This is just an idea, I haven't tri
Thanks for the responses. I am testing it out using MMapDirectory.
Cheers!
-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Wednesday, June 10, 2009 6:36 AM
To: java-user@lucene.apache.org
Subject: RE: Reloading RAM Directory from updated FS Directory
There is cur
On Jun 10, 2009, at 3:17 AM, Uwe Schindler wrote:
A HitCollector is the correct way to do this (especially because the
order of hits is mostly not interesting when retrieving all hits).
OK, here's what I came up with:
Term t = /* ... */
Collection files = new LinkedList();
FieldS
This (very large number of unique terms) is a problem for Lucene currently.
There are some simple improvements we could make to the terms dict
format to not require so much RAM per term in the terms index...
LUCENE-1458 (flexible indexing) has these improvements, but
unfortunately tied in w/ lots
Hej hej,
i have a question regarding lucenes memory usage
when launching a query. When i execute my query
lucene eats up over 1gig of heap-memory even
when my result-set is only a single hit. I
found out that this is due to the "ensureIndexIsRead()"
method-call in the "TermInfosReader" class, wh
Thanks for bringing closure!
Mike
On Wed, Jun 10, 2009 at 4:42 AM, Mateusz Berezecki wrote:
> Hi list!
>
> I'm forwarding as somehow I did not put the list in the CC but the
> answer I think is noteworthy, so here it is. Please remember to use
> StringBuffer before blaming lucene ;-)
>
> Actual t
There is currently a patch/idea from Earwin around that modifies
MMapDirectory to optionally call MappedByteBuffer.load() after mapping a
file from the directory. MappedByteBuffer.load() tells the operating system
kernel to try to swap as much as possible from the file into physical RAM.
-
Uwe
there is one case where MMAP does not beat RAM, initial warm-up after process
restart. With MMAP it can take a while before you get up to speed. MMAP with
reopen is the best, if you run without restart.
- Original Message
> From: Uwe Schindler
> To: java-user@lucene.apache.org
>
> You are wrong.
> As the java doc reads: 'Finds the top n hits for query'
> You can set n to whatever value you want, 'all' documents (not results!)
> indexed in your index if you want, or 10 if you want the top 10.
You are right, you can, but if you just want to retrieve all hits, this is
ineffe
You are wrong.
As the java doc reads: 'Finds the top n hits for query'
You can set n to whatever value you want, 'all' documents (not results!)
indexed in your index if you want, or 10 if you want the top 10.
Anyway, it's just an example to give a direction..
Wouter
> This code snipplet would on
This code snipplet would only work, if you want to iterate over e.g. the
first 20 documents (which is n in your code). If he wants to iterate over
all results, he should think about using a custom (Hit)Collector.
The code below will be very slow for large result sets (because retrieving
stored fie
Will this do?
IndexReader indexReader = searcher.getIndexReader();
TopDocs topDocs = searcher.search(Query query, int n);
for (int i = 0; i < topDocs.scoreDocs.length; i++) {
Document document = indexReader.document( topDocs.scoreDocs[i].doc);
final File f = new File( document.get( "FILE" ) )
Hi
The code below might do the job. Based on the example at
http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Hits.html
Completely uncompiled and untested of course.
TopDocCollector collector = new TopDocCollector(hitsPerPage);
final Term t = /* ... */;
Query query = new Te
I'd recommend using your favourite queueing service to pass all
updates to a central process, the one and only process that updates
the index. If you don't already have a favourite queueing service,
http://en.wikipedia.org/wiki/Java_Message_Service#Provider_implementations
lists several JMS implem
Hi list!
I'm forwarding as somehow I did not put the list in the CC but the
answer I think is noteworthy, so here it is. Please remember to use
StringBuffer before blaming lucene ;-)
Actual time consumed by lucene is now ~130 minutes as opposed to 20
hours which is neat. I can do much more passes
27 matches
Mail list logo