Re: Find version of Lucene library

2005-03-08 Thread Bill Janssen
> The JDK comes with some classes that will let you get to > that elegantly. You mean clumsily :-). Bill - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: QueryParser refactoring

2005-03-08 Thread Erik Hatcher
On Mar 8, 2005, at 5:17 PM, Chris Hostetter wrote: Earlier in this thread... : >>> +a -> a : >> : >> Hmmm this is a debatable one. It's returning a TermQuery in this : >> case for "a". Is that appropriate? Or should it return a : >> BooleanQuery : >> with a single TermQuery as required? :

Re: Assorted questions

2005-03-08 Thread Otis Gospodnetic
Your memory is serving you well. http://www.lucenebook.com/search?query=%22range+query%22+performance Note the hit in section 6.5.1 - the fact that we used range queries in the performance section is an indicator that one can really mess things up if using range queries injudiciously. :) In parti

Re: Find version of Lucene library

2005-03-08 Thread Otis Gospodnetic
The version information should be included in the Manifest file inside the Jar. The JDK comes with some classes that will let you get to that elegantly. Otis --- Paul Mellor <[EMAIL PROTECTED]> wrote: > Hi guys, > > Just a quick query - is there any way that I can determine at runtime > the >

Assorted questions

2005-03-08 Thread Scott Smith
I needed to return my hits list in date/time order (instead of relevancy). So, I implemented a class that converted dates to an int and stored the integer as a field in my index. I passed a Sort object to the IndexSearcher (indicating that the sort field was convertible to int) to get things back

large indexes

2005-03-08 Thread Scott Smith
I have the need to create an index which will potentially have a million+ documents. I know Lucene can accomplish this. However, the other requirement is that I need to be continually updating it during the date (adding 1-30 documents/minute). I guess I had thought that I might try to have an ac

Re: QueryParser refactoring

2005-03-08 Thread Chris Hostetter
Earlier in this thread... : >>> +a -> a : >> : >> Hmmm this is a debatable one. It's returning a TermQuery in this : >> case for "a". Is that appropriate? Or should it return a : >> BooleanQuery : >> with a single TermQuery as required? : > Ok. : > The question how to handle BooleanQuerie

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread markharw00d
So this is just the old problem of avoiding reading large, less frequently accessed fields when you are trying to read just the smaller more frequently accessed fields eg titles. You can achieve this by: a) Modifying Lucene using something like the code I originally posted which stops reading

Re: fresh indexing bug?

2005-03-08 Thread eks dev
works like a charm, thanks! as a side note, the latest patch with properly disabled coord helped me a lot as well, made coord usable. --- Doug Cutting <[EMAIL PROTECTED]> wrote: > eks dev wrote: > > When I reindex with the lucene from the latest svn > > snapshot, a lot of .tii files that are dele

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
On Tue, 8 Mar 2005 18:10:26 + (GMT), mark harwood wrote:  "to be able" != "able to be" > OK, I thought you wanted to count terms within the > title field. If you want to group counts on the whole > field value change the loop in my last post to this: > > for(int i=0;i { > String fiel

Re: lucene question, examples

2005-03-08 Thread Brian Cuttler
Chris, Thank you - will take a look at nutch and let you/the list know if it was a good fit for us. On Fri, Mar 04, 2005 at 03:02:03PM -0800, Chris Hostetter wrote: > > If your goal is to setup a web based search interface that queries a > lucene index containing all of the documents from your

Webapp Demo throws ArrayIndexOutOfBoundsException on Large index

2005-03-08 Thread Chris D
I've been playing with the webapp and attempting to search over two indexes that I've created. The first was 700M the second is 2.3G. When the webapp attempts to search the second I get a "ArrayIndexOutOfBoundsException": java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.get(A

Re: QueryParser refactoring

2005-03-08 Thread Erik Hatcher
On Mar 8, 2005, at 12:38 PM, Morus Walter wrote: That reminds me of a remark Doug made in the discussion of bug 25820 (http://issues.apache.org/bugzilla/show_bug.cgi?id=25820#c7), that it would be useful if an empty query string parses to an empty query. So probably a check for that should be added

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread mark harwood
>>> "to be able" != "able to be" OK, I thought you wanted to count terms within the title field. If you want to group counts on the whole field value change the loop in my last post to this: for(int i=0;ihttp://uk.messenger.yahoo.com -

Re: QueryParser refactoring

2005-03-08 Thread Morus Walter
Erik Hatcher writes: > >> I think you must have tried this in a transient state when I forgot > >> to > >> check in some JavaCC generated files. Try again. This one now > >> returns > >> an empty BooleanQuery. > >> > > ok. > > I'm a bit puzzled, since I called javacc myself, so generated files

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
Hey Mark, thanks for the code sample. I did look into this, but for a book's title field, for example, "to be able" != "able to be" and "java programmer" != "programmer (java)" - tokenizer will remove the parentheses so in my use case at least, a field value isn't simply an array of its terms.

Re: fresh indexing bug?

2005-03-08 Thread Doug Cutting
eks dev wrote: When I reindex with the lucene from the latest svn snapshot, a lot of .tii files that are deletable appear (checked with luke). This is a bug I introduced yesterday. Thanks for catching it! The term index (.tii) was not closed, and on Windows this makes it undeleteable. I just com

Re: QueryParser refactoring

2005-03-08 Thread Doug Cutting
sergiu gordea wrote: So .. here is an example of how I parse a simple query string provided by a user ... the user checks a few flags and writes "test ko AND NOT bo" and the resulting query.toString() is saved in the database: +(+(subject:test description:test keywordsTerms:test koProperties:test

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread mark harwood
Your requirement was clear but I guess my suggested solution wasn't. Here it is in detail: public class CountTest { public static void main(String[] args) throws Exception { RAMDirectory tempDir = new RAMDirectory(); Analyzer analyzer=new WhitespaceAnalyze

fresh indexing bug?

2005-03-08 Thread eks dev
When I reindex with the lucene from the latest svn snapshot, a lot of .tii files that are deletable appear (checked with luke). This was not happening with previous version using exactly the same code for indexing. At the end of indexing Optimize was succesfully finished. Is this a bug? WinXP,

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
Ah, I apologize. My use of the word "frequency" was misleading. By that, I meant, the number of hits/documents, whose fields have that value. Once again: doc a=title:1,keyword:a,contents:somelongmemoryhoggingstring doc b=title:1,keyword:a,contents:somelongmemoryhoggingstring doc c=title:1,keyword

Re: QueryParser refactoring

2005-03-08 Thread Morus Walter
Daniel Naber writes: > On Tuesday 08 March 2005 14:46, Erik Hatcher wrote: > > > > Right. `a AND (NOT b)'  parses to `a' > > > > Is this what we want to happen for a general purpose next generation > > Lucene QueryParser though?  I'm not sure.  Perhaps this should be a > > ParseException instead?

Re: QueryParser refactoring

2005-03-08 Thread Morus Walter
Erik Hatcher writes: > > On Mar 8, 2005, at 4:38 AM, Morus Walter wrote: > >> I created a modified Query->String converter for my current day time > >> project (as I use a String representation for the most recently used > >> drop-down that is stored as a client-side cookie) that explicitly puts >

Re: QueryParser refactoring

2005-03-08 Thread Daniel Naber
On Tuesday 08 March 2005 14:46, Erik Hatcher wrote: > > Right. `a AND (NOT b)'  parses to `a' > > Is this what we want to happen for a general purpose next generation > Lucene QueryParser though?  I'm not sure.  Perhaps this should be a > ParseException instead? As we have no concept of a "warnin

Find version of Lucene library

2005-03-08 Thread Paul Mellor
Hi guys, Just a quick query - is there any way that I can determine at runtime the version of Lucene that I am using? I'm upgrading a system from v1.3 to v1.4.3 and I would like to be able to print out the version at startup so that I can be sure that I have got my paths all correct and haven't

Re: QueryParser refactoring

2005-03-08 Thread sergiu gordea
Erik Hatcher wrote: On Mar 8, 2005, at 4:11 AM, sergiu gordea wrote: In our project I save search strings, generated with query.toString in the database and I reconstruct the Query at runtime. I would appreciate if the new QueryParser will pass the following assert: Query query = QueryParser.pa

Re: QueryParser refactoring

2005-03-08 Thread Erik Hatcher
On Mar 8, 2005, at 4:38 AM, Morus Walter wrote: I created a modified Query->String converter for my current day time project (as I use a String representation for the most recently used drop-down that is stored as a client-side cookie) that explicitly puts in "OR" between SHOULD BooleanClauses. You

Re: QueryParser refactoring

2005-03-08 Thread Erik Hatcher
On Mar 8, 2005, at 4:11 AM, sergiu gordea wrote: In our project I save search strings, generated with query.toString in the database and I reconstruct the Query at runtime. I would appreciate if the new QueryParser will pass the following assert: Query query = QueryParser.parse(queryString, ana

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread mark harwood
The new TermFreqVector code sounds like what you need here. This gives you fast access to precomputed totals of term frequencies for each document. See IndexReader.getTermFreqVector Send instant messages to your online friends http://uk.messenger.yahoo.com

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
Neither. :-) 4) Top 10 fieldvalues (for some fields) returned in search results So, let's say the results of a search were: doc a=title:1,keyword:a,contents:somelongmemoryhoggingstring doc b=title:1,keyword:a,contents:somelongmemoryhoggingstring doc c=title:1,keyword:b,contents:somelongmemoryhog

Re: Searching multiple fields with same name

2005-03-08 Thread Claude Libois
Why not use the MultiFieldQueryParser(look at http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/queryParser/MultiFieldQueryParser.html)? This one allow you to specify on which field the search will be done. I think that for your example 'lucene AND jakarta' will be transform by the parse

Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread mark harwood
Not sure I get what the requirement is yet: >>Here's my requirement, ..I need to perform a simple >>"Top 10 most frequent occurring " from a search. Does this mean: 1)Top 10 fieldnames present in each of your matching documents? 2)Top 10 most frequent terms found in a choice of field? 3)Top 10

Document lazy-loading WAS [Re: Fast access to a random page of the search results.]

2005-03-08 Thread Kelvin Tan
Mark, On Tue, 8 Mar 2005 09:56:37 + (GMT), mark harwood wrote: >> But I suppose for Document >> has to be further subclassed so that the other >> non-initialized fields can be obtained as well, or >> > I don't think Document would be the right place for > this - as a design pattern it is cast

Re: Searching multiple fields with same name

2005-03-08 Thread Jose Miguel Diez
Hello, Just store it in two separate fields, and prepare a query: (title1: myquery ) OR (title2: myquery ) Substitution of myquery by your expressions will work fine. Saludos, Jose Miguel Romain Laboisse escribió: >Hello, > >I am indexing documents which may have more than one title and I wo

Searching multiple fields with same name

2005-03-08 Thread Romain Laboisse
Hello, I am indexing documents which may have more than one title and I would like to be able to search these titles separately. For example, a document may have two titles, "Jakarta Lucene" and "Powerful search engine". A search on 'lucene AND jakarta' should return this document but a search on

Re: Fast access to a random page of the search results.

2005-03-08 Thread mark harwood
> But I suppose for Document > has to be further subclassed so that the other > non-initialized fields can be obtained as well, or I don't think Document would be the right place for this - as a design pattern it is cast as a "value object" or "transfer object" which is passed to (potentially remo

Re: QueryParser refactoring

2005-03-08 Thread Morus Walter
Erik Hatcher writes: > > ok. > > I'm a bit puzzled, since I called javacc myself, so generated files > > should > > not matter, but if it's fixed, I don't care about what went wrong. > > Let me know if there is still an issue, though I added this exact case > to TestPrecedenceQueryParser and its

Re: QueryParser refactoring

2005-03-08 Thread sergiu gordea
2) Single term queries using +/- flags are parse to a query without flag +a -> a Hmmm this is a debatable one. It's returning a TermQuery in this case for "a". Is that appropriate? Or should it return a BooleanQuery with a single TermQuery as required? I'd prefer, if query parser parses qu

Re: QueryParser refactoring

2005-03-08 Thread Erik Hatcher
On Mar 8, 2005, at 2:29 AM, Morus Walter wrote: Erik Hatcher writes: Your changes look great in general, though I find some issues: 1) 'stop OR stop AND stop' where stop is a stopword gives a parse error: Encountered "" at line 1, column 0. Was expecting one of: ... ... I think you must have