RE: Tokenizer question: how can I force ? and ! to be separate tokens?

2009-07-17 Thread OBender
Thanks, I think I got it. -Original Message- From: John Byrne [mailto:john.by...@propylon.com] Sent: Friday, July 17, 2009 2:43 PM To: java-user@lucene.apache.org Subject: Re: Tokenizer queston: how can I force ? and ! to be separate tokens? Yes, you could even use the WhitespaceTokenize

RE: Why next(Token) in CharTokenizer is final?

2009-07-17 Thread Uwe Schindler
In general all TokenStream classes from Lucene core are/should be final, because should simply put an additional TokenFilter into the chain to modify the tokenization. A few of them are not final (CharTokenizer is abstract), but here the next() methods are final because of the above explained reas

Why next(Token) in CharTokenizer is final?

2009-07-17 Thread OBender
Hi All, I think this is a question to Lucene dev team. Why the next(Token) method of CharTokenizer was made final? It is quite inconvenient and I don't see the reason why it is so. Thanks. - To unsubscribe, e-mail: java-user-u

Re: Unknown format version: -9

2009-07-17 Thread Michael McCandless
Yes, the index has changed, from this issue: https://issues.apache.org/jira/browse/LUCENE-1654 Mike On Fri, Jul 17, 2009 at 12:28 PM, Avishek Anand wrote: > Hi, >    I get the error "Unknown format version: -9" when I try to create an > index from the latest lucene build (built from the rece

Unknown format version: -9

2009-07-17 Thread Avishek Anand
Hi, I get the error "Unknown format version: -9" when I try to create an index from the latest lucene build (built from the recent code in the repository - lucene_2_4:748824). I use Luke from the website. Assuming that the luke is the most recent one, did anything change in lucene-core because

Re: Tokenizer queston: how can I force ? and ! to be separate tokens?

2009-07-17 Thread John Byrne
Yes, you could even use the WhitespaceTokenizer and then look for the symbols in a token filter. You would get [you?] as a single token; your job in the token filter is then to store the [?] and return the [you]. The next time the token filter is called for the next token, you return the [?] th

Re: Tokenizer queston: how can I force ? and ! to be separate tokens?

2009-07-17 Thread Matthew Hall
I'd think extending WhiteSpaceTokenizer would be a good place to start. Then create a new Analyzer that exactly mirrors your current Analyzer, with the exception that it uses your new tokenizer instead of WhiteSpaceTokenizer (Well.. there is of course my assumption that you are using an Analyz

Tokenizer queston: how can I force ? and ! to be separate tokens?

2009-07-17 Thread OBender
Hi All, I need to make ? and ! characters to be a separate token e.g. to split [how are you?] in to 4 tokens [how], [are], [you] and [?] what would be the best way to do this? Thanks

Re: Unable to do exact search with Lucene.

2009-07-17 Thread prashant ullegaddi
Actually, the format of the query for which it worked was: "\"Apache jakarta\""~10. Thanks for the help. Prashant. On Fri, Jul 17, 2009 at 7:13 PM, Siraj Haider wrote: > Try doing a single word search, instead of a phrase. I once had a similar > problem when I indexed using Field.setOmitTf(tr

Re: Unable to do exact search with Lucene.

2009-07-17 Thread Siraj Haider
Try doing a single word search, instead of a phrase. I once had a similar problem when I indexed using Field.setOmitTf(true) which removed all the positional information from index, which is required to do phrase searches. -siraj Erick Erickson wrote: The first thing I'd do is get a copy of

Re: SpanScorer problem?

2009-07-17 Thread Koji Sekiguchi
Mark, I just opened: https://issues.apache.org/jira/browse/LUCENE-1752 Thank you very much for looking into this! Koji Mark Miller wrote: > Thanks Koji - I just made a patch for the fix if you want to pop open a > JIRA issue. > > Two query types were making their own terms map and passing them

Re: SpanScorer problem?

2009-07-17 Thread Mark Miller
Thanks Koji - I just made a patch for the fix if you want to pop open a JIRA issue. Two query types were making their own terms map and passing them to extract, rather than using the top level term map - but extract would use the term map to see if it saw the term before. The result was, for the t

SpanScorer problem?

2009-07-17 Thread Koji Sekiguchi
Hello, This problem was reported by my customer. They are using Solr 1.3 and uni-gram, but it can be reproduced with Lucene 2.9 and WhitespaceAnalyzer. The program for reproducing is at the end of this mail. Query: (f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g") The snippet we expe

Sorting field contating NULL values consumes field cache memory

2009-07-17 Thread Ganesh
I am doing sorting on DateTime with minute resolution. I am having 90 million of records and sorting is consuming nearly 500 MB. 30% records are not part of primary result set and they don't have sort field. But field cache memory (4 * IndexReader.maxDoc() * (# of different fields actually used

Re: Unable to find: org.apache.lucene.index.memory.AnalyzerUtil

2009-07-17 Thread prashant ullegaddi
Got it! Thanks. On Fri, Jul 17, 2009 at 10:21 AM, Adriano Crestani < adrianocrest...@gmail.com> wrote: > Hi, > > The package org.apache.lucene.index.memory belongs to a contrib jar. Try to > add lucene-memory-.jar to your classpath. > > Regards, > Adriano Crestani > > On Thu, Jul 16, 2009 at 9:23