Digester and simple XML files

2005-04-22 Thread Andy Roberts
Hi all, Just been playing with Digester after reading chapter 7 in LIA. Seems to fit my needs as I have a relatively simple XML structure. some sente

lucene score

2005-04-22 Thread Ravi
Hi Does lucene relevancy score depend on the total number of documents in the index? Will I get different scores for the same document for the same query for indexes of different sizes? If it does, when does it return a higher score? Thanks in advance, Ravi.

Re: sorting on "dates" a little fuzzy...

2005-04-22 Thread Rasik Pandey
Hi James, Have a look in Bugzilla at issue #34563. I contributed some code last night that may be helpful to you. Have a look at the patchTestSort.txt, which is a diff of my changes to test the classes I created. This may help you understand how to use the classes, but I assume based upon the valu

Re: Lucene bulk indexing

2005-04-22 Thread Aalap Parikh
Hi Peter, As I said in my earlier email, changing the mergeFactor and minMergeDocs properties in IndexWriter did help but still not what I would like it to be. I then tried what you suggested. RAMDirectory-based disk indexing and it has worked SUPERBLY for me. I was able to reduce the processing t

Re: WildCard search replacement

2005-04-22 Thread Aalap Parikh
Hi, The idea about begin marker sounds good. And the prefix could be anything and could be made really small by just using may be 2 or 3 or even less characters. In terms of the PrefixQuery for 123* wildcard search, wouldn't such a query be rewritten to a BooleanQuery? I tried using PrefixQuery a

Re: sorting on "dates" a little fuzzy... - resolved

2005-04-22 Thread James
This was resolved by specifying Locale.US in the SortField constructor. I guess our default locale setting is messed up somewhere. Thanks to everyone who responded! James --- James Levine <[EMAIL PROTECTED]> wrote: > I have an index of around 3 million records, and typical queries > can res

Re: Lucene bulk indexing

2005-04-22 Thread Aalap Parikh
Hi, > : the app using JProfiler and found out that 90% of > time > : is spent in the IndexWriter.addDocument call. As > > what analyzer are you using? I am using the StandardAnalyzer (tried using SimpleAnalyzer too, but not much affect on performance). > : My machine: Pentium 4 CPU 2.40 GHz > :

Fwd: New Article: Groovy and Lucene

2005-04-22 Thread Erik Hatcher
Begin forwarded message: From: Jeremy Rayner <[EMAIL PROTECTED]> Date: April 22, 2005 7:48:05 AM EDT To: [EMAIL PROTECTED] Subject: New Article: Groovy and Lucene Reply-To: Jeremy Rayner <[EMAIL PROTECTED]> Hi groovy users, Just a quick note to let you know that I have just posted a short articl

Re: token type question

2005-04-22 Thread Paul Libbrecht
Le 22 avr. 05, à 09:36, Pierrick Brihaye a écrit : Are you say that I should construct Token in analyzer like new Token ("chem_H2O", 100, 103, "chem"); note that chem_ is added prefix to H2O, and 100 to 103 is length of H2O rather than chem_H2O? Well... 100 to 103 are offsets provided by the reade

Querying things "close to" a gang of terms ?

2005-04-22 Thread Paul Libbrecht
Hi, Many people speak about the cosine distance or variation thereof (even probably n-gram things)... nice... All this geometry has some power but a typical requirement of embedding such power is to be able to query documents that are close to this query's terms or to this document in the sense

Re: WildCard search replacement

2005-04-22 Thread Volodymyr Bychkoviak
Hi, first of all you'll never get TooManyClauseException because you're sesarching for phrase query. (i.e. this query will not be rewrited into boolean query). about your question: if you need search like 123* you can use some term as begining marker and include this term at the begining of phr

Re: token type question

2005-04-22 Thread Pierrick Brihaye
Hi, [EMAIL PROTECTED] a écrit : Thanks Pierrick. Are you say that I should construct Token in analyzer like new Token ("chem_H2O", 100, 103, "chem"); note that chem_ is added prefix to H2O, and 100 to 103 is length of H2O rather than chem_H2O? Well... 100 to 103 are offsets provided by the reader (