Re: JavaCC version for lucene 1.4

2005-09-27 Thread Erik Hatcher
On Sep 27, 2005, at 4:52 PM, Zhang, Lisheng wrote: Hi, I would like to know the JavaCC version used to build lucene 1.4? I could not get this information from downloaded files (only mentioned JavaCC site). I can't say for sure what version was used for the official 1.4.x builds, but my hu

JavaCC version for lucene 1.4

2005-09-27 Thread Zhang, Lisheng
Hi, I would like to know the JavaCC version used to build lucene 1.4? I could not get this information from downloaded files (only mentioned JavaCC site). Thanks very much for helps, Lisheng - To unsubscribe, e-mail: [EMAIL PRO

Re: IndexWriter exception

2005-09-27 Thread Otis Gospodnetic
Alex, Can you write your code as a unit test that we can run and see the error? Are you running this on Windows? Otis --- Alex Kiselevski <[EMAIL PROTECTED]> wrote: > > Hi, > > I have a strange exception when I'm trying to recreate an > IndexWriter, > that was previously defined. > > > >

Re: Splitting of words

2005-09-27 Thread Erik Hatcher
On Sep 27, 2005, at 6:29 AM, Endre Stølsvik wrote: On Thu, 22 Sep 2005, Erik Hatcher wrote: | | On Sep 22, 2005, at 4:36 AM, Endre Stølsvik wrote: | | > | > | The StandardTokenizer is the most sophisticated one built into Lucene. | > You | > | can see the types of tokens it emits by looking

RE: query behavior

2005-09-27 Thread Alberto Squassabia
Mr. Elschot: > In what way is your problem similar? same: (1) large document spaces (millions); (2) searches at intersection of AND with one side of the AND sporting only a few matches, and the other side sporting large match sets. different: (3) greater variety of data types Alberto S alb

Re: Is analyzing same as tokenizing???

2005-09-27 Thread Erik Hatcher
On Sep 27, 2005, at 9:01 AM, Anand Kishore wrote: That is correct. A Keyword field is taken exact case as-is as a single term. For example: If I have a keyword field named "sender" which has the value "The Motely Fool", doing a search for either of these query terms "Fool" or "fool" or

Re: StandardTokenizer

2005-09-27 Thread Yonik Seeley
I'd write a TokenFilter for that... much easier. -Yonik Now hiring -- http://tinyurl.com/7m67g On 9/27/05, Lorenzo Viscanti <[EMAIL PROTECTED]> wrote: > > Hi, I'm trying to modify the StandardTokenizer, in order to get to get a > good tokenization for my needs. > Basically I would like to separat

RE: Issue with sounds-like queries

2005-09-27 Thread Jayakumar.V
Hi, I got around the RefinedSoundex issue by indexing each field separately. But by using RefinedSoundex, I'm loosing the flexibility provided by the Metaphone / DoubleMetaphone algorithms. If the user misspells the search word (which can happen) as KOILON instead of QUILON, I wouldn't get back

StandardTokenizer

2005-09-27 Thread Lorenzo Viscanti
Hi, I'm trying to modify the StandardTokenizer, in order to get to get a good tokenization for my needs. Basically I would like to separate two tokens when I find an apostrophe. I think I should modify the StandardTokenizer.jj file to do that, but I'm in trouble while changing the grammar. Can some

Issue with sounds-like queries

2005-09-27 Thread Jayakumar.V
Hi, I'm facing an issue with sounds-like queries. I've experimented with both Apache Codec & the Phonetix library from Tangentum Technologies (http://www.tangentum.biz/en/products/phonetix/faqs/index.html ) to see if I could sort out the issue somehow using either of the libraries. I've an

IndexWriter exception

2005-09-27 Thread Alex Kiselevski
Hi, I have a strange exception when I'm trying to recreate an IndexWriter, that was previously defined. I did the following steps: 1. mWriter = new IndexWriter(indexPath, analyzer, true); 2. mWriter.addDocument(document); 3. mWriter.optimize(); 4. mWriter.c

Re: Is analyzing same as tokenizing???

2005-09-27 Thread Anand Kishore
> That is correct. A Keyword field is taken exact case as-is as a > single term. For example: If I have a keyword field named "sender" which has the value "The Motely Fool", doing a search for either of these query terms "Fool" or "fool" or "Motely" on the "sender" field should match the documents

Re: Is analyzing same as tokenizing???

2005-09-27 Thread Erik Hatcher
On Sep 27, 2005, at 1:58 AM, Anand Kishore wrote: Is 'Analyzing' same as 'Tokenizing'? Yes, in Lucene terminology these two are the same. When we say the Keyword field is not analyzed, but indexed and stored, does it indicate it is not tokenized as well? That means inorder to find a query

RE: Single Analyzer for multiple European languages

2005-09-27 Thread Madhu Satyanarayana Panitini
Hi all, One more idea would be using cryptograms to differentiate between languages, and then u can use the delete stopwords and apply stemming for particular language. Regards madhu -Original Message- From: Endre Stølsvik [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 27, 2005 4:08

Re: Single Analyzer for multiple European languages

2005-09-27 Thread Endre Stølsvik
On Mon, 26 Sep 2005, Andrzej Bialecki wrote: | Shashikant Kore wrote: | | > Search: | > - Get the superset of stopwords by merging the stopwords from all the | > languages. | | This step doesn't make sense. Stopwords ARE language specific. A stopword in | one language may be a valid content word

Re: Splitting of words

2005-09-27 Thread Endre Stølsvik
On Thu, 22 Sep 2005, Erik Hatcher wrote: | | On Sep 22, 2005, at 4:36 AM, Endre Stølsvik wrote: | | > | > | The StandardTokenizer is the most sophisticated one built into Lucene. | > You | > | can see the types of tokens it emits by looking at the javadoc here: | > | | >

Re: query behavior

2005-09-27 Thread Paul Elschot
On Tuesday 27 September 2005 01:13, Chris Hostetter wrote: > > I *believe* that because of the ConjunctionScorer in 1.9, BooleanQueries > consisting of all required terms are now optimized for situations like > this, the Scorer for the common clause won't be asked to score things that > the un-com