Thanks, I think I got it.
-Original Message-
From: John Byrne [mailto:john.by...@propylon.com]
Sent: Friday, July 17, 2009 2:43 PM
To: java-user@lucene.apache.org
Subject: Re: Tokenizer queston: how can I force ? and ! to be separate
tokens?
Yes, you could even use the WhitespaceTokenize
In general all TokenStream classes from Lucene core are/should be final,
because should simply put an additional TokenFilter into the chain to modify
the tokenization.
A few of them are not final (CharTokenizer is abstract), but here the next()
methods are final because of the above explained reas
Hi All,
I think this is a question to Lucene dev team.
Why the next(Token) method of CharTokenizer was made final?
It is quite inconvenient and I don't see the reason why it is so.
Thanks.
-
To unsubscribe, e-mail: java-user-u
Yes, the index has changed, from this issue:
https://issues.apache.org/jira/browse/LUCENE-1654
Mike
On Fri, Jul 17, 2009 at 12:28 PM, Avishek Anand wrote:
> Hi,
> I get the error "Unknown format version: -9" when I try to create an
> index from the latest lucene build (built from the rece
Hi,
I get the error "Unknown format version: -9" when I try to create an
index from the latest lucene build (built from the recent code in the
repository - lucene_2_4:748824). I use Luke from the website. Assuming that
the luke is the most recent one, did anything change in lucene-core because
Yes, you could even use the WhitespaceTokenizer and then look for the
symbols in a token filter. You would get [you?] as a single token; your
job in the token filter is then to store the [?] and return the [you].
The next time the token filter is called for the next token, you return
the [?] th
I'd think extending WhiteSpaceTokenizer would be a good place to start.
Then create a new Analyzer that exactly mirrors your current Analyzer,
with the exception that it uses your new tokenizer instead of
WhiteSpaceTokenizer (Well.. there is of course my assumption that you
are using an Analyz
Hi All,
I need to make ? and ! characters to be a separate token e.g. to split [how
are you?] in to 4 tokens [how], [are], [you] and [?] what would be the best
way to do this?
Thanks
Actually, the format of the query for which it worked was:
"\"Apache jakarta\""~10.
Thanks for the help.
Prashant.
On Fri, Jul 17, 2009 at 7:13 PM, Siraj Haider wrote:
> Try doing a single word search, instead of a phrase. I once had a similar
> problem when I indexed using Field.setOmitTf(tr
Try doing a single word search, instead of a phrase. I once had a
similar problem when I indexed using Field.setOmitTf(true) which removed
all the positional information from index, which is required to do
phrase searches.
-siraj
Erick Erickson wrote:
The first thing I'd do is get a copy of
Mark,
I just opened:
https://issues.apache.org/jira/browse/LUCENE-1752
Thank you very much for looking into this!
Koji
Mark Miller wrote:
> Thanks Koji - I just made a patch for the fix if you want to pop open a
> JIRA issue.
>
> Two query types were making their own terms map and passing them
Thanks Koji - I just made a patch for the fix if you want to pop open a
JIRA issue.
Two query types were making their own terms map and passing them to
extract, rather than using the top level term map - but extract would
use the term map to see if it saw the term before. The result was, for
the t
Hello,
This problem was reported by my customer. They are using Solr 1.3
and uni-gram, but it can be reproduced with Lucene 2.9 and
WhitespaceAnalyzer.
The program for reproducing is at the end of this mail.
Query:
(f1:"a b c d" OR f2:"a b c d") AND (f1:"b c g" OR f2:"b c g")
The snippet we expe
I am doing sorting on DateTime with minute resolution. I am having 90 million
of records and sorting is consuming nearly 500 MB. 30% records are not part of
primary result set and they don't have sort field. But field cache memory (4 *
IndexReader.maxDoc() * (# of different fields actually used
Got it!
Thanks.
On Fri, Jul 17, 2009 at 10:21 AM, Adriano Crestani <
adrianocrest...@gmail.com> wrote:
> Hi,
>
> The package org.apache.lucene.index.memory belongs to a contrib jar. Try to
> add lucene-memory-.jar to your classpath.
>
> Regards,
> Adriano Crestani
>
> On Thu, Jul 16, 2009 at 9:23
15 matches
Mail list logo