I have been trying to use grep, but my file is way too big (~300gb). Could
Lucene search through it more efficiently than grep?
Thanks,
Michael
On Sun, Apr 12, 2009 at 7:53 PM, Shashi Kant wrote:
> Not sure what the business-case for this is and why you cannot use
> RegEx for this. But you cou
Hi,
Does anyone know where I can find descriptions of Lucene's searching
algorithm, besides the lecture at University of Pisa 2004? Has it been
published? I'm trying to find a reference to the algorithm.
Thanks,
Michael
-
To un
Hi,
On a 64-bit platform with 30gb RAM and 8 real CPUs, should
MMapDirectory or RAMDirectory provide better search performance on a
5gb index?
Cheers,
Michael
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands
A few things might help:
- use getSpans() on the scorer of the query, iterate the resulting Spans
and count the number of different doc values.
This saves the scoring and the sorting on score value.
Thanks for your advice. I was wondering, is each span given by
getSpans() a unique match acco
Hi,
I have a 5gb index containing 2mil documents and am trying to run
1mil+ queries against it. Most of the queries are SpanQueries and it
occurs to me that the search performance is quite slow when using 2, 3
SpanOrQueries nested inside a SpanNearQuery, which in turn is nested
inside another Spa
After some more research, it seems that one of the bottlenecks is
Spans.next(), can I drop anything out in order to improve performance?
Most of the queries are SpanNearQuery with SpanOrQuery as its clauses.
Any help would be much appreciated.
Regards,
Michael
On 5/25/06, Michael Chan <[EM
I see.
Also, as I'm only interested in the number of results returned and not
in the ranking of documents returned, is there any component I can
simplify in order to improve search performance? Perhaps, Scorer or
Similarity?
Thanks.
Michael
On 5/24/06, Chris Hostetter <[EMAIL PROTECTED]> wrote
I think I've fixed the problem by changing/fixing RAMOutputStream.java.
On 5/23/06, Muralidharan V <[EMAIL PROTECTED]> wrote:
On 5/23/06, Michael Chan <[EMAIL PROTECTED]> wrote:
>
> As I have quite a bit of RAM (~20gb)
And I once had a 486 with 2MB RAM, which was
ead of
SpanNearQuery?
Erik
On May 23, 2006, at 1:36 AM, Michael Chan wrote:
> Hi,
>
> As I use SpanQuery purely for the use of slop, I was wondering how to
> make SpanQuery more efficient,. Since I don't need any span
> information, is there a way to disable the computation for
5/23/06, Daniel Naber <[EMAIL PROTECTED]> wrote:
On Dienstag 23 Mai 2006 08:26, Michael Chan wrote:
> As I have quite a
> bit of RAM (~20gb), is there a way I could store the index in RAM or
> any other way that makes use of it to improve performance?
RAMDirectory has just been fixed
Hi,
I'm trying to run 20mil+ queries against an index containing 2mil
documents, and it has been quite slow. I've been reading about
MemoryIndex, but it is only a single-document index. As I have quite a
bit of RAM (~20gb), is there a way I could store the index in RAM or
any other way that makes
Hi,
As I use SpanQuery purely for the use of slop, I was wondering how to
make SpanQuery more efficient,. Since I don't need any span
information, is there a way to disable the computation for span and
other unneeded overhead?
Thanks.
Michael
---
e,
but this sounds like a case for JIRA.
Also, please try to write and attach (to your JIRA case) a unit test that
demonstrates a problem, something we can run and debug this. Without that we
may not be able to fix this.
Otis
- Original Message
From: Michael Chan <[EM
rQuery?
Thanks.
Michael
On 5/20/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
take a look at BooleanQuery.setMinimumNumberShouldMatch(int)
: Date: Sat, 20 May 2006 14:27:00 +0800
: From: Michael Chan <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apa
Hi,
Somehow, after running many searches using instances of SpanQuery
(mostly SpanNearQuery), I get the ArrayIndexOutOfBounds exception:
"bash-2.03$ java.lang.ArrayIndexOutOfBoundsException: 2147483647
at org.apache.lucene.search.spans.SpanScorer.score(SpanScorer.java:72)
at
org.a
Hi,
Is there any way to make sure, e.g. at least 2, terms of a subquery
are contained in the results? For example, with the query
"OR(t1,t2,t3) AND OR(t4,t5,t6)", the docs returned must contain either
2 or more of (t2,t3,t3) and either 2 or more of (t4,t5,t6). I've read
about Similarity, but it s
since order doesn't matter here, the two queries should be equal,
right?
Cheers,
Michael
On 5/6/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
What version of Lucene are you using? It should work fine with
1.9. If not, could you supply a test case demonstrating this issue?
Thanks,
Hi,
It seems to me SpanNearQuery.equals()/.hash() are not overriden
because I've tried testing two logically equivalent queries but
.equals() returns false. Could anyone provide an implementation?
Cheers,
Michael
-
To unsubscr
Hi,
I'm trying to build a SpanQuery using word stems. Is parsing each term
with a QueryParser, constructed with an Analyzer giving stemmed
tokenStream, the right approach? It just seems to me that QueryParser is
designed to parse queries, and so my hunch is that there might be a
better way.
19 matches
Mail list logo