There are several issues here.... 1> How are you getting the entity reference? You must be encoding the stream (or getting it encoded for you). So the first thing I'd do is un-encode it. 2> After that, it's a question of what Filters/Analyzers you're using. Take a look at ISOLatin1AccentFilter. I'm unclear whether it "closes up" the case you're looking at, so be sure to check. 3> Since my peculiar situation can't use the Filter (the character set I'm using isn't standard), I've pre-processed the input (both at index and query time) to substitute the empty string for the apostrophe
Hope this helps Erick On 11/5/07, Leire Urcelay <[EMAIL PROTECTED]> wrote: > > Sorry, I did a mistake in my previous email. > The field "L'article" is indexed as "L 'article". The blank space is > inserted between 'L' and ''article'. > > Thanks, > > Leire > > -----Message d'origine----- > De: Leire Urcelay [mailto:[EMAIL PROTECTED] > Envoyé: lundi, 5. novembre 2007 13:02 > À: java-user@lucene.apache.org > Objet: blank space before special characters > > Hello, > > I have the following problem with my lucene index. > > When indexing fields containing special characters (like &), a blank space > is inserted before the special character. For example: the content > "L'article" is indexed as "L '" (with a blank space between 'L' and > '&'). > > Is there any way to avoid that? > > The characteristics of my field are the following: Indexed, Tokenized, > Stored and Term Vector. > > Thanks in advance for your help, > > Leire > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >