[ 
https://issues.apache.org/jira/browse/LUCENE-5237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13774442#comment-13774442
 ] 

Shai Erera commented on LUCENE-5237:
------------------------------------

bq. This isn't a bug: if you delete the last character, its all that must 
happen.

You're right. So first, this isn't what happens. If pos=3 and len=4 (delete the 
last character), it calls System.arraycopy (even in the patch I posted). This 
could be improved. Second, the problem is that it deletes the last character, 
even if pos >= length. I.e. you ask to delete the character beyond what is 
"valid" in that buffer. I can't believe there is a TokenFilter that relies on 
being able to delete characters beyond the length of the buffer as it knows.

bq. Shouldn't it throw an exception instead when pos + nChars > buf.length?

Maybe we should ...

bq. We can mark the whole class lucene.internal or copy the code of the methods 
to each class actually using them

You mean inline these methods?
                
> StemmerUtil.deleteN may delete too many characters
> --------------------------------------------------
>
>                 Key: LUCENE-5237
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5237
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-5237.patch
>
>
> StemmerUtil.deleteN calls to delete(), but in some cases, it may delete too 
> many characters. E.g. if you execute this code:
> {code}
> char[] buf = "abcd".toCharArray();
> int len = StemmerUtil.deleteN(buf, buf.length, buf.length, 3);
> System.out.println(new String(buf, 0, len));
> {code}
> You get "a", even though no character should have been deleted (not according 
> to the javadocs nor common logic).
> The problem is in delete(), which always returns {{len-1}}, even if no 
> character is actually deleted.
> I'll post a patch that fixes it shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to