Re: Searching a single file

2009-04-12 Thread Michael Chan
I have been trying to use grep, but my file is way too big (~300gb). Could Lucene search through it more efficiently than grep? Thanks, Michael On Sun, Apr 12, 2009 at 7:53 PM, Shashi Kant wrote: > Not sure what the business-case for this is and why you cannot use > RegEx for this. But you cou

Lucene searching algorithm

2006-10-08 Thread Michael Chan
Hi, Does anyone know where I can find descriptions of Lucene's searching algorithm, besides the lecture at University of Pisa 2004? Has it been published? I'm trying to find a reference to the algorithm. Thanks, Michael - To un

MMapDirectory vs RAMDirectory

2006-05-28 Thread Michael Chan
Hi, On a 64-bit platform with 30gb RAM and 8 real CPUs, should MMapDirectory or RAMDirectory provide better search performance on a 5gb index? Cheers, Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands

Re: BufferedIndexInput.readByte performance

2006-05-27 Thread Michael Chan
A few things might help: - use getSpans() on the scorer of the query, iterate the resulting Spans and count the number of different doc values. This saves the scoring and the sorting on score value. Thanks for your advice. I was wondering, is each span given by getSpans() a unique match acco

BufferedIndexInput.readByte performance

2006-05-26 Thread Michael Chan
Hi, I have a 5gb index containing 2mil documents and am trying to run 1mil+ queries against it. Most of the queries are SpanQueries and it occurs to me that the search performance is quite slow when using 2, 3 SpanOrQueries nested inside a SpanNearQuery, which in turn is nested inside another Spa

Re: Making SpanQuery more effiicent

2006-05-25 Thread Michael Chan
After some more research, it seems that one of the bottlenecks is Spans.next(), can I drop anything out in order to improve performance? Most of the queries are SpanNearQuery with SpanOrQuery as its clauses. Any help would be much appreciated. Regards, Michael On 5/25/06, Michael Chan <[EM

Re: Making SpanQuery more effiicent

2006-05-24 Thread Michael Chan
I see. Also, as I'm only interested in the number of results returned and not in the ranking of documents returned, is there any component I can simplify in order to improve search performance? Perhaps, Scorer or Similarity? Thanks. Michael On 5/24/06, Chris Hostetter <[EMAIL PROTECTED]> wrote

Re: Running 20mil queries against an index

2006-05-23 Thread Michael Chan
I think I've fixed the problem by changing/fixing RAMOutputStream.java. On 5/23/06, Muralidharan V <[EMAIL PROTECTED]> wrote: On 5/23/06, Michael Chan <[EMAIL PROTECTED]> wrote: > > As I have quite a bit of RAM (~20gb) And I once had a 486 with 2MB RAM, which was

Re: Making SpanQuery more effiicent

2006-05-23 Thread Michael Chan
ead of SpanNearQuery? Erik On May 23, 2006, at 1:36 AM, Michael Chan wrote: > Hi, > > As I use SpanQuery purely for the use of slop, I was wondering how to > make SpanQuery more efficient,. Since I don't need any span > information, is there a way to disable the computation for

Re: Running 20mil queries against an index

2006-05-23 Thread Michael Chan
5/23/06, Daniel Naber <[EMAIL PROTECTED]> wrote: On Dienstag 23 Mai 2006 08:26, Michael Chan wrote: > As I have quite a > bit of RAM (~20gb), is there a way I could store the index in RAM or > any other way that makes use of it to improve performance? RAMDirectory has just been fixed

Running 20mil queries against an index

2006-05-22 Thread Michael Chan
Hi, I'm trying to run 20mil+ queries against an index containing 2mil documents, and it has been quite slow. I've been reading about MemoryIndex, but it is only a single-document index. As I have quite a bit of RAM (~20gb), is there a way I could store the index in RAM or any other way that makes

Making SpanQuery more effiicent

2006-05-22 Thread Michael Chan
Hi, As I use SpanQuery purely for the use of slop, I was wondering how to make SpanQuery more efficient,. Since I don't need any span information, is there a way to disable the computation for span and other unneeded overhead? Thanks. Michael ---

Re: SpanScorer Out Of Bounds

2006-05-22 Thread Michael Chan
e, but this sounds like a case for JIRA. Also, please try to write and attach (to your JIRA case) a unit test that demonstrates a problem, something we can run and debug this. Without that we may not be able to fix this. Otis - Original Message From: Michael Chan <[EM

Re: Matching at least N terms of subqueries

2006-05-21 Thread Michael Chan
rQuery? Thanks. Michael On 5/20/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: take a look at BooleanQuery.setMinimumNumberShouldMatch(int) : Date: Sat, 20 May 2006 14:27:00 +0800 : From: Michael Chan <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apa

SpanScorer Out Of Bounds

2006-05-21 Thread Michael Chan
Hi, Somehow, after running many searches using instances of SpanQuery (mostly SpanNearQuery), I get the ArrayIndexOutOfBounds exception: "bash-2.03$ java.lang.ArrayIndexOutOfBoundsException: 2147483647 at org.apache.lucene.search.spans.SpanScorer.score(SpanScorer.java:72) at org.a

Matching at least N terms of subqueries

2006-05-19 Thread Michael Chan
Hi, Is there any way to make sure, e.g. at least 2, terms of a subquery are contained in the results? For example, with the query "OR(t1,t2,t3) AND OR(t4,t5,t6)", the docs returned must contain either 2 or more of (t2,t3,t3) and either 2 or more of (t4,t5,t6). I've read about Similarity, but it s

Re: SpanNearQuery .equals()/.hash()

2006-05-06 Thread Michael Chan
since order doesn't matter here, the two queries should be equal, right? Cheers, Michael On 5/6/06, Erik Hatcher <[EMAIL PROTECTED]> wrote: What version of Lucene are you using? It should work fine with 1.9. If not, could you supply a test case demonstrating this issue? Thanks,

SpanNearQuery .equals()/.hash()

2006-05-05 Thread Michael Chan
Hi, It seems to me SpanNearQuery.equals()/.hash() are not overriden because I've tried testing two logically equivalent queries but .equals() returns false. Could anyone provide an implementation? Cheers, Michael - To unsubscr

Stemming terms in SpanQuery

2006-05-01 Thread Michael Chan
Hi, I'm trying to build a SpanQuery using word stems. Is parsing each term with a QueryParser, constructed with an Analyzer giving stemmed tokenStream, the right approach? It just seems to me that QueryParser is designed to parse queries, and so my hunch is that there might be a better way.