QueryParser bug?

2007-02-21 Thread Antony Bowesman
Using QueryParser to parse *tex* seems to create a PrefixQuery rather than WildcardQuery due to the trailing *, rather than Wildcard because of the other leading *. As a result, this does not match, for example "context". I've swapped the order of WILDTERM and PREFIXTERM in queryparsr.jj but

Searching eats lots of memory?

2007-02-21 Thread maureen tanuwidjaja
I also would like to know wheter searching in the indexfile eats lots of memory...I always ran out of memory when doing searching,i.e. it gives the exception java heap space(although I have put -Xmx768 in the VM argument) ...Is there any way to solve it? - TV di

Re: Scoring while sorting

2007-02-21 Thread Yonik Seeley
On 2/21/07, dmitri <[EMAIL PROTECTED]> wrote: What is the point to calculate score if the result set is going to be sorted by some field? Is it ok to replace several terms query (a OR b OR c) with MatchAllQuery and RangeFilters (from a to a, from b to b, from c to c) if sorting is needed? Won't

Optimizing Index

2007-02-21 Thread maureen tanuwidjaja
Hi, I had an exsisting index file with the size 20.6 GB...I havent done any optimization in this index yet.Now I had a HDD of 100 GB,but apparently when I create program to optimize(which simply calls writer.optimize() to this indexfile),it gives the error that there is not enough space on

Scoring while sorting

2007-02-21 Thread dmitri
What is the point to calculate score if the result set is going to be sorted by some field? Is it ok to replace several terms query (a OR b OR c) with MatchAllQuery and RangeFilters (from a to a, from b to b, from c to c) if sorting is needed? Won't it be faster? - dmitri --

Re: Positions in SpanFirst

2007-02-21 Thread Chris Hostetter
: So I don't see why using a SpanNear that respects order and a large : IncrementGap won't solve your problem.. Although it would return "odd" i think the use case he's worreid about is that he needs to be able to find matches just on the "start" of a persons name, ie... Email#1 To:

Re: Positions in SpanFirst

2007-02-21 Thread Erick Erickson
I really think you need to stop obsessing on SpanFirst . I suspect that this is leading you down an unrewarding path. So I don't see why using a SpanNear that respects order and a large IncrementGap won't solve your problem.. Although it would return "odd" matches. Let's say you indexed "firs

Re: Positions in SpanFirst

2007-02-21 Thread Antony Bowesman
Ahh, now it falls into place. Thanks Antony Chris Hostetter wrote: it's not called Analyzer.getPositionAfterGap .. it's Analyzer.getPositionIncrementGap .. it's the Position Increment used when there is a Gap -- so returning 0 means that no exra increment is used, and multiple values are treate

Re: Positions in SpanFirst

2007-02-21 Thread Chris Hostetter
: So, if you can add 1000, shouldn't setting 0 each time cause it to start at 0 : each time? The default Analyzer.getPositionIncrementGap always returns 0. it's not called Analyzer.getPositionAfterGap .. it's Analyzer.getPositionIncrementGap .. it's the Position Increment used when there is a Ga

Re: Positions in SpanFirst

2007-02-21 Thread Antony Bowesman
Hi Erick, What this does is allow you to put gaps between successive sets of terms indexed in the same field. For instance... doc.add("field", "some stuff"); doc.add("field", "bunch hooey"); doc.add("field", "what is this"); writer.add(doc); In this case, there would be the following positions,

Re: updating index

2007-02-21 Thread Erick Erickson
I think you can get MUCH better efficiency by using TermEnum/TermDocs. But I think you need to index (UN_TOKENIZED) your primary key (although now I'm not sure. But I'd be surprised if TermEnum worked with un-indexed data. Still, it'd be worth trying but I've always assumed that TermEnums only wor

Re: Positions in SpanFirst

2007-02-21 Thread Chris Hostetter
: so I thought that sounded good, but there does not seem to be a way to set it : and most of the Analyzers just seem to use the base Analyzer method which : returns 0, so I'm now confused as to what this actually does in practice. by default all the analyzers return 0, but you can subclass any a

Re: MultiSearcher vs IndexSearcher(new MultiReader

2007-02-21 Thread Chris Hostetter
: Could someone enlighten me a bit about the subject? When do I want to : use a MultiSearcher rather than a searcher running of a MultiReader? : There seems to be a bunch of limitations in the MultiSearcher, and it : is these that made me curious. as i understand it the limitations of the MultiSe

Re: Returning only a small set of results

2007-02-21 Thread Chris Hostetter
: A question about efficiency and the internal workings of the Hits class. : When we make a call to IndexSearcher's search method thus: : : Hits hits = searcher.Search(query); : : Do we actually, physically get back all the results of the query even if : there are 20 million results or for efficien

updating index

2007-02-21 Thread no spam
I have an index where I'm storing the primary key of my database record as an unindexed field. Nightly I want to update my search index with any database changes / additions. I don't really see an efficient way to update these records besides doing something like this which I'm worried with thr

ClassCastException/DocumentWriter and NullPointerException/RAMInputStream

2007-02-21 Thread Antony Bowesman
When adding documents to an index has anyone seen either java.lang.ClassCastException: org.apache.lucene.analysis.Token cannot be cast to org.apache.lucene.index.Posting at org.apache.lucene.index.DocumentWriter.sortPostingTable(DocumentWriter.java:238) at org.apache.lucene.index.DocumentW

Re: ANN: Luke 0.7 released

2007-02-21 Thread Erick Erickson
Excellent! I'll be getting this first thing in the morning... For a guy who's "really busy at his day job" you sure turned this around quickly! Erick On 2/21/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: Hi all, I'm happy to announce that a new version of Luke - the Lucene Index Toolbox -

Re: Positions in SpanFirst

2007-02-21 Thread Erick Erickson
See below.. On 2/21/07, Antony Bowesman <[EMAIL PROTECTED]> wrote: Hi Erick, > I'm not sure you can, since all the interfaces I use alter the increment > between successive terms, but I'll be the first to admit that there are > many > nooks and crannies that I don't know about... But I suspect

Re: Stop long running queries

2007-02-21 Thread Chris Hostetter
optimizing away the expensive cases is your best bet if you can do it ... another option is to use a custom HitCollector which keeps track of how long it's been running and throws a subclass of RUntimeException which you explicitly catch and deal with as appropriate if the query has been taking t

RE: Search for a term in all fields

2007-02-21 Thread Chris Hostetter
: Well, here's my current thoughts on acheiveing this. Instead of putting : a 1000 space gap between elements of the 1ll field could I not use a : character that isn't used in the data such as ~ and then somehow (don't : know how) use that to search all fields? you could certianly introduce an a

ANN: Luke 0.7 released

2007-02-21 Thread Andrzej Bialecki
Hi all, I'm happy to announce that a new version of Luke - the Lucene Index Toolbox - is now available. As usually, you can get it from: http://www.getopt.org/luke Highlights of this release: * support for Lucene 2.1.0 release and earlier * pagination of search results * support for many

Re: Positions in SpanFirst

2007-02-21 Thread Antony Bowesman
Hi Erick, I'm not sure you can, since all the interfaces I use alter the increment between successive terms, but I'll be the first to admit that there are many nooks and crannies that I don't know about... But I suspect that a negative increment is not supported intentionally I read your

Returning only a small set of results

2007-02-21 Thread Kainth, Sachin
Hi all, A question about efficiency and the internal workings of the Hits class. When we make a call to IndexSearcher's search method thus: Hits hits = searcher.Search(query); Do we actually, physically get back all the results of the query even if there are 20 million results or for efficiency

Re: Search for a term in all fields

2007-02-21 Thread Erick Erickson
Nothing jumps out at me Erick On 2/21/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote: Sorry I didn't make myself clear at all. Remember you said that it is possible to do this: > Sure. Convert your simple queries into span queries (which are also > relatively simple). Then, when you index

Re: pagination

2007-02-21 Thread Erik Hatcher
I'll add that in a web application that using Hits to page through results is perfectly acceptable. Going to these other APIs is a bit more complicated and often unnecessary. Don't prematurely optimize! :) Erik On Feb 21, 2007, at 8:07 AM, Erick Erickson wrote: See TopDocs, HitC

Re: pagination

2007-02-21 Thread Mohammad Norouzi
Hi I've overcome this problem without HitCollector, I build an interface just like java.sql.ResultSet and its implementation class accept a Hits as parameter and provide next() previous() etc. method to navigate between records. in my opinion this is a good solution. Hope this help you On 2/21/0

Re: Stop long running queries

2007-02-21 Thread eks dev
have a look at LuceneQueryOptimizer.java in nutch - Original Message From: Tim Johnson <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, 21 February, 2007 3:34:36 PM Subject: Stop long running queries I'm having issues with some queries taking in excess of 500 secs t

Stop long running queries

2007-02-21 Thread Tim Johnson
I'm having issues with some queries taking in excess of 500 secs to run to completion. The system being used consists of ~100 million docs spilt up across ~600 indexes. The indexes are of various sizes from 15MB to 8GB and all searches done in the system require an exact count of matching hits. Th

RE: pagination

2007-02-21 Thread Kainth, Sachin
I might be missing something because TopDocs seems to only be about finding the relevancy of documents and HitCollector doesn't seem to be relavent either. -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 21 February 2007 13:08 To: java-user@lucene.apache.org Subje

Re: NO_NORMS and TOKENIZED?

2007-02-21 Thread Grant Ingersoll
On Feb 21, 2007, at 2:35 AM, Chris Hostetter wrote: the other situation i brought up in that thread from way back was something Solr doesn't currently have a good solution for: one field value for display, but other values (in the same field) for searching ... likewise indexing one field value

RE: Search for a term in all fields

2007-02-21 Thread Kainth, Sachin
Sorry I didn't make myself clear at all. Remember you said that it is possible to do this: > Sure. Convert your simple queries into span queries (which are also > relatively simple). Then, when you index everything in the "all" > field, subclass your analyzer to return a large PositionIncrement

Re: possible to disable internal caching?

2007-02-21 Thread jm
Thanks Karl and Daniel I am already disponing of the Searchers I am using. And regarding IndexWriter.setTermIndexInterval(), I need the indexing to be as fast as possible, is the searches where I dont need any speed and prefer to keep the memory low. javier On 2/14/07, Daniel Naber <[EMAIL PROT

Re: NO_NORMS and TOKENIZED?

2007-02-21 Thread Marvin Humphrey
On Feb 20, 2007, at 11:35 PM, Chris Hostetter wrote: the biggest differnece is that the field infos aren't globals, so as segments merge and old segments get deleted old data (and field info) vanishes into hte ether ... i take advantage of that a lot when planning upgrades ... many types of fi

Re: Positions in SpanFirst

2007-02-21 Thread Erick Erickson
I'm not sure you can, since all the interfaces I use alter the increment between successive terms, but I'll be the first to admit that there are many nooks and crannies that I don't know about... But I suspect that a negative increment is not supported intentionally But I really doubt you wan

Re: pagination

2007-02-21 Thread Erick Erickson
See TopDocs, HitCollector, etc. Don't iterate through a Hits objects to get docs beyond, say, 100 since it's designed to efficiently return the first 100 documents but re-executes the queries each 100 or so times you advance to the next document. Erick On 2/21/07, Kainth, Sachin <[EMAIL PROTECTE

Re: Search for a term in all fields

2007-02-21 Thread Erick Erickson
I don't see what you're getting at. There are only two forms of a query term field:value value And the second is really the first with the default field you specified in the parser implied. So just think of all terms you specify in a query as field:term. Having some "special character" in th

Re: MultiSearcher vs IndexSearcher(new MultiReader

2007-02-21 Thread Nott
Hi MultiSearcher we have used whenwe use more than one folder. As of we used so far we did not had much issues with multisearcher. The index at times becomes slow when you incluse more no of folder to search We have the full index in one folder and the incremental index in another folder so that

MultiSearcher vs IndexSearcher(new MultiReader

2007-02-21 Thread karl wettin
Could someone enlighten me a bit about the subject? When do I want to use a MultiSearcher rather than a searcher running of a MultiReader? There seems to be a bunch of limitations in the MultiSearcher, and it is these that made me curious. -- karl --

Positions in SpanFirst

2007-02-21 Thread Antony Bowesman
Hi, I have a field to which I add several bits of information, e.g. doc.add(new Field("x", "first bit")); doc.add(new Field("x", "second part")); doc.add(new Field("x", "third section")); I am using SpanFirstQuery to search them with something like: while... SpanTermQuery stquery = new SpanT

pagination

2007-02-21 Thread Kainth, Sachin
Hello, I was wondering if Lucene provides any mechanism which helps in pagination. In other words is there a way to return the first 10 of 500 results and then the next 10 and so on. Cheers This email and any attached files are confidential and copyright protected. If you are not the address

RE: Search for a term in all fields

2007-02-21 Thread Kainth, Sachin
Well, here's my current thoughts on acheiveing this. Instead of putting a 1000 space gap between elements of the 1ll field could I not use a character that isn't used in the data such as ~ and then somehow (don't know how) use that to search all fields? -Original Message- From: Chris Hos