> On 3 Mar 2019, at 02:00, Bruno P. Kinoshita <ki...@apache.org> wrote:
> 
> Hi Alex,
> Also found the two implementations similar, but confusing.
> - The parameter name inconsistency
> +1
> 
> 
> - The edge case logic inconsistency
> 
> +1
> 
> - Change to a stepwise charAt comparison
> +1 if the behaviour is kept. Right now some crazy characters like those old 
> latin letters are supported (e.g. LATIN CAPITAL LIGATURE IJ), and the result 
> for equal and equal ignore case are as expected.
> And right now those methods do not work with surrogate characters (e.g. 
> GEORGIAN LETTER AN and GEORGIAN CAPITAL LETTER AN are false for equals and 
> equals ignore case). But this is an edge case that we can ignore for now I 
> believe.

I’ve tested it locally and all units tests are fine. So I’ll PR when I have 
time.

> CheersBruno
> 
>    On Sunday, 3 March 2019, 1:29:12 pm NZDT, Alex Herbert 
> <alex.d.herb...@gmail.com> wrote:  
> 
> Having looked a bit more at StringUtils it appears that:
> 
> public static boolean equals(final CharSequence cs1, final CharSequence cs2);
> public static boolean equalsIgnoreCase(final CharSequence str1, final 
> CharSequence str2);
> 
> share edge case logic checking but it is implemented differently (although 
> with the same effect). They also have inconsistent parameter names (cs1 vs 
> str1).
> 
> The equals method then ends up calling CharSequenceUtils.regionMatches but 
> without the ignore case option. This method does a lot more work than 
> necessary as it performs a lot of edge case checks then loops with regard to 
> checking for case. It would be cleaner to mimic String.equals by using a 
> stepwise charAt comparison.
> 
> Any objections to a PR to fix:
> 
> - The parameter name inconsistency
> - The edge case logic inconsistency
> - Change to a stepwise charAt comparison
> 
> Alex
> 
>> On 2 Mar 2019, at 20:49, Alex Herbert <alex.d.herb...@gmail.com> wrote:
>> 
>> 
>>> On 2 Mar 2019, at 16:59, Mark Dacek <m...@syberion.com 
>>> <mailto:m...@syberion.com>> wrote:
>>> 
>>> Is your proposed method a stepwise charAt comparison across both, assuming
>>> non-null and equal length?
>> 
>> Yes. Although the StringUtils.equals(CharSequence, CharSequence) from [lang] 
>> will do the job correctly (thanks Gary). It currently does all the edge case 
>> checks then calls a region matching method using the entire length but the 
>> effect is the same as:
>> 
>> for (int i = 0; i < cs1.length(); i++) {
>>     if (cs1.charAt(i) != cs2.charAt(i)) {
>>         return false;
>>     }
>> }
>> return true;
>> 
>> Switching in the above code instead of the call to regionMatches(…) at the 
>> end of StringUtils.equals(CharSequence, CharSequence) would avoid repeating 
>> all the edge case checks of length at the start of that method and the case 
>> insensitivity functionality. 
>> 
>> The StringUtils.equals method already detects if String is input as both 
>> arguments and defaults to that if possible. So this is basically for any 
>> other combination of CharSequence types where a simple stepwise charAt 
>> comparison is wanted.
>> 
>> 
>>> Doesn't seem like a bad idea, though I'm curious whether there's a use-case
>>> where toString() on both and comparing isn't more expedient.
>> 
>> Just the memory overhead of duplicating to create a String. If a match is 
>> unlikely, especially near the start, then this is a cost to consider for 
>> longer strings.
>> 
>> I was just after something to put in place of the incorrect usage of:
>> 
>> CharSequence cs1, cs2;
>> cs1.equals(cs2);
>> 
>> Which is not part of the CharSequence interface and works only if inputting 
>> 2 objects that support equals correctly, like String or StringBuilder.
>> 
>> I’ve just has a look for .equals() in all of [text] and this is actually a 
>> bug that is in the newly submitted JaroWinklerSimilarity too. 
>> 
>> I’ll do a PR to fix that one.
>> 
>>> 
>>> On Sat, Mar 2, 2019 at 11:53 AM Alex Herbert <alex.d.herb...@gmail.com 
>>> <mailto:alex.d.herb...@gmail.com>>
>>> wrote:
>>> 
>>>> I am helping with the PR for TEXT-126 to add to the similarity package.
>>>> 
>>>> Part of the new algorithm requires identifying if two CharSequences are
>>>> identical. Is there a utility in Text to do something like this:
>>>> 
>>>> public static boolean CharSequenceUtils.equals(CharSequence, CharSequence);
>>>> 
>>>> I cannot find one with a quick regex search of the library. I am not
>>>> familiar with Lang either but this is a dependency so a method from there
>>>> could be used.
>>>> 
>>>> The current PR is using left.equals(right) on the input CharSequence to
>>>> compare to one to another which is wrong if the two input CharSequences do
>>>> not support matching, e.g. if the input was a String and StringBuilder then
>>>> String.equals(StringBuilder) would not match, even if the characters were
>>>> the same.
>>>> 
>>>> Regards,
>>>> 
>>>> Alex
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org 
>>>> <mailto:dev-unsubscr...@commons.apache.org>
>>>> For additional commands, e-mail: dev-h...@commons.apache.org 
>>>> <mailto:dev-h...@commons.apache.org>
>>>> 
>>>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to