RE: Indexing accented characters, then searching by any form

2008-12-08 Thread Dora
It seems that the index and search process does not work in the same way: The "tokenStream" method is called at time of search while for indexing the "resusableTokenStream" is called. Overriding resusableTokenStream (like I did for tokenStream) fixed the problem. -- View this message in context

RE: Indexing accented characters, then searching by any form

2008-11-26 Thread Dora
Diego Cassinera wrote: > > Are you sure you are creating the fields with Field.Index.ANALYZED ? > > Yes, my fields are all ANALYZED. (One was ANALYZED_NO_NORMS but changing it to ANALYZED did not solve the problem) I checked with the debugger, and the analyzer I use tu update my indexer doe

RE: Indexing accented characters, then searching by any form

2008-11-25 Thread Diego Cassinera
Are you sure you are creating the fields with Field.Index.ANALYZED ? -Mensaje original- De: Dora [mailto:[EMAIL PROTECTED] Enviado el: martes, 25 de noviembre de 2008 12:22 p.m. Para: java-user@lucene.apache.org Asunto: Re: Indexing accented characters, then searching by any form

Re: Indexing accented characters, then searching by any form

2008-11-25 Thread Dora
Karl Wettin wrote: > > Try this (dry coded) snippet instead: > > StandardAnalyzer objAnalyzer = new StandardAnalyzer() { >public TokenStream tokenStream(String fieldName, Reader reader) { > return new ISOLatin1AccentFilter(super.tokenStream(fieldName, > reader)); >} > } > I tr

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Well, it is done now. As final result, I surrended myself to "double-storing". This way, I have indexed the original text with COMPRESSED option to save some space. And to highlight the results correctly, I made some matching between unaccented-words and original words by regular expressions, an

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Petite Abeille
On Feb 11, 2008, at 4:00 PM, Cesar Ronchese wrote: For example: Indexed word: usuário Terms typed by the user, to find the word above: usuário or usuario or usuãrio, etc. If you feel ambitious, you can try something along the lines of Sean M. Burke's Unidecode!: http://interglacial.com/~s

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Ops! Found a situation here Karl: If the content is stored without accents, everything is OK. But, as my content is stored with accents, and I noticed the ISOFilter just removes the accent from the search terms, it is not returning to my Hits collection. Any idea how to fix it? -- View this mes

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Woot, Karl. It worked like a charm! It even worked with the Highlighter. THANKS! karl wettin-3 wrote: > > > 11 feb 2008 kl. 18.16 skrev Cesar Ronchese: > >> I don't know how to set that filter to Query object. > > It is a TokenStream you filter, not the Query. In your case the > TokenStr

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Karl Wettin
11 feb 2008 kl. 18.16 skrev Cesar Ronchese: I don't know how to set that filter to Query object. It is a TokenStream you filter, not the Query. In your case the TokenStream is produced by the QueryParser invoking analyzer.tokenStream(field, new StringReader(input)). So what you have to

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Erick Erickson
See below... On Feb 11, 2008 12:17 PM, Cesar Ronchese <[EMAIL PROTECTED]> wrote: > > Hey, Erick. You inferred right. > > I analized your code and it looks like a common Indexing and Searching > code. > Are you sure you pasted the correct code? :P > Did you try to run it? It's just a self-contai

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Hey, Erick. You inferred right. I analized your code and it looks like a common Indexing and Searching code. Are you sure you pasted the correct code? :P Anyways, is the concept about doubling storing data, one content with accents and other without? If yes, I did it earlier, but once I search i

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
> One more thing, > are you aware of that you are supposed to apply that filter on the > query too? I don't know how to set that filter to Query object. I've searched to see if it is possible, but I can't find references. If it is possible, do you have a quick example? I'm searching this way:

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Cesar Ronchese
Hey Karl. Thanks for the response. I have some doubts more: 1) About the ISOLatin1AccentFilter class: > What is the problem you have with this? Are they not unique enough? I need to store the words in the way it was written. So, if the text to be indexed contains the word "usuário", my user expe

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Erick Erickson
I'm inferring that you need the original text for display purposes or some such, but want to search a "canonical" form. So the following may be totally irrelevant if my inference is wrong. Indexed and stored are two very distinct things in Lucene. If you create a field that is both stored and

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Karl Wettin
11 feb 2008 kl. 16.08 skrev Karl Wettin: All I could find is about the ISOLatin1AccentFilter class, which as far I could understand, it just removes the accented chars so I can store it in its unaccented form. What is the problem you have with this? Are they not unique enough? One more

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Karl Wettin
11 feb 2008 kl. 16.00 skrev Cesar Ronchese: Hello, guys. I've searching the google to make the lucene performs accent- insensitive searches. All I could find is about the ISOLatin1AccentFilter class, which as far I could understand, it just removes the accented chars so I can store it