It seems that the index and search process does not work in the same way:
The "tokenStream" method is called at time of search while for indexing the
"resusableTokenStream" is called.
Overriding resusableTokenStream (like I did for tokenStream) fixed the
problem.
--
View this message in context
Diego Cassinera wrote:
>
> Are you sure you are creating the fields with Field.Index.ANALYZED ?
>
>
Yes, my fields are all ANALYZED. (One was ANALYZED_NO_NORMS but changing it
to ANALYZED did not solve the problem)
I checked with the debugger, and the analyzer I use tu update my indexer
doe
Are you sure you are creating the fields with Field.Index.ANALYZED ?
-Mensaje original-
De: Dora [mailto:[EMAIL PROTECTED]
Enviado el: martes, 25 de noviembre de 2008 12:22 p.m.
Para: java-user@lucene.apache.org
Asunto: Re: Indexing accented characters, then searching by any form
Karl Wettin wrote:
>
> Try this (dry coded) snippet instead:
>
> StandardAnalyzer objAnalyzer = new StandardAnalyzer() {
>public TokenStream tokenStream(String fieldName, Reader reader) {
> return new ISOLatin1AccentFilter(super.tokenStream(fieldName,
> reader));
>}
> }
>
I tr
Well, it is done now.
As final result, I surrended myself to "double-storing". This way, I have
indexed the original text with COMPRESSED option to save some space.
And to highlight the results correctly, I made some matching between
unaccented-words and original words by regular expressions, an
On Feb 11, 2008, at 4:00 PM, Cesar Ronchese wrote:
For example:
Indexed word: usuário
Terms typed by the user, to find the word above: usuário or usuario or
usuãrio, etc.
If you feel ambitious, you can try something along the lines of Sean
M. Burke's Unidecode!:
http://interglacial.com/~s
Ops!
Found a situation here Karl:
If the content is stored without accents, everything is OK.
But, as my content is stored with accents, and I noticed the ISOFilter just
removes the accent from the search terms, it is not returning to my Hits
collection.
Any idea how to fix it?
--
View this mes
Woot, Karl.
It worked like a charm! It even worked with the Highlighter. THANKS!
karl wettin-3 wrote:
>
>
> 11 feb 2008 kl. 18.16 skrev Cesar Ronchese:
>
>> I don't know how to set that filter to Query object.
>
> It is a TokenStream you filter, not the Query. In your case the
> TokenStr
11 feb 2008 kl. 18.16 skrev Cesar Ronchese:
I don't know how to set that filter to Query object.
It is a TokenStream you filter, not the Query. In your case the
TokenStream is produced by the QueryParser invoking
analyzer.tokenStream(field, new StringReader(input)). So what you have
to
See below...
On Feb 11, 2008 12:17 PM, Cesar Ronchese <[EMAIL PROTECTED]> wrote:
>
> Hey, Erick. You inferred right.
>
> I analized your code and it looks like a common Indexing and Searching
> code.
> Are you sure you pasted the correct code? :P
>
Did you try to run it? It's just a self-contai
Hey, Erick. You inferred right.
I analized your code and it looks like a common Indexing and Searching code.
Are you sure you pasted the correct code? :P
Anyways, is the concept about doubling storing data, one content with
accents and other without? If yes, I did it earlier, but once I search i
> One more thing,
> are you aware of that you are supposed to apply that filter on the
> query too?
I don't know how to set that filter to Query object. I've searched to see if
it is possible, but I can't find references. If it is possible, do you have
a quick example?
I'm searching this way:
Hey Karl. Thanks for the response. I have some doubts more:
1) About the ISOLatin1AccentFilter class:
> What is the problem you have with this? Are they not unique enough?
I need to store the words in the way it was written. So, if the text to be
indexed contains the word "usuário", my user expe
I'm inferring that you need the original text for display purposes or some
such,
but want to search a "canonical" form. So the following may be totally
irrelevant if my inference is wrong.
Indexed and stored are two very distinct things in Lucene. If you create
a field that is both stored and
11 feb 2008 kl. 16.08 skrev Karl Wettin:
All I could find is about the ISOLatin1AccentFilter class, which as
far I
could understand, it just removes the accented chars so I can store
it in
its unaccented form.
What is the problem you have with this? Are they not unique enough?
One more
11 feb 2008 kl. 16.00 skrev Cesar Ronchese:
Hello, guys.
I've searching the google to make the lucene performs accent-
insensitive
searches.
All I could find is about the ISOLatin1AccentFilter class, which as
far I
could understand, it just removes the accented chars so I can store
it
16 matches
Mail list logo