Re: Writing a TokenConcatenateFilter - junk characters appearing on output.

2011-09-30 Thread Jithin
I meant to say. Now my analser chain looks like this.

Re: Writing a TokenConcatenateFilter - junk characters appearing on output.

2011-09-30 Thread Jithin
I have added this custom filter at the end of my query. Now only my first document is getting indexed. -- View this message in context: http://lucene.472066.n3.nabble.com/Writing-a-TokenConcatenateFilter-junk-characters-appearing-on-output-tp3383684p3384379.html Sent from the Lucene - Java Users

Re: Writing a TokenConcatenateFilter - junk characters appearing on output.

2011-09-30 Thread Jithin
Thanks a million Uwe. That fixes it. On Sat, Oct 1, 2011 at 4:16 AM, Uwe Schindler [via Lucene] < ml-node+s472066n3383905...@n3.nabble.com> wrote: > Hi, > > The junk is appended here: buffer.append(termAtt.buffer()); > > I assume you are on Lucene 3.1+, so use buffer.append(termAtt); termAtt > im

RE: Writing a TokenConcatenateFilter - junk characters appearing on output.

2011-09-30 Thread Uwe Schindler
Hi, The junk is appended here: buffer.append(termAtt.buffer()); I assume you are on Lucene 3.1+, so use buffer.append(termAtt); termAtt implements CharSequence, so it can be appended to any StringBuilder. The code you are using appends the whole char array, which may contain characters after term

Writing a TokenConcatenateFilter - junk characters appearing on output.

2011-09-30 Thread Jithin
Hi, I am trying to write a TokenFilter which just concatenates all the the token in the input TokenStream. Issue I am facing is that my filter is outputting certain junk characters in addition to the concatenated string. I believe this is caused by StringBuilder. This is my incrementToken() functi

Re: StandardTokenizer

2011-09-30 Thread Peyman Faratin
thank you Ian On Sep 30, 2011, at 4:19 AM, Ian Lea wrote: > This all changed with the 3.1 release. See > http://lucene.apache.org/java/3_1_0/changes/Changes.html#3.1.0.api_changes, > number 18. > > You can get the old behaviour with StandardAnalyzer by passing > VERSION_30, or you could look at

Re: StandardTokenizer

2011-09-30 Thread Ian Lea
This all changed with the 3.1 release. See http://lucene.apache.org/java/3_1_0/changes/Changes.html#3.1.0.api_changes, number 18. You can get the old behaviour with StandardAnalyzer by passing VERSION_30, or you could look at UAX29URLEmailTokenizer which should pick up the email component, althou