> On 3 Mar 2019, at 02:00, Bruno P. Kinoshita <ki...@apache.org> wrote:
>
> Hi Alex,
> Also found the two implementations similar, but confusing.
> - The parameter name inconsistency
> +1
>
>
> - The edge case logic inconsistency
>
> +1
>
> - Change to a stepwise charAt comparison
> +1 if the behaviour is kept. Right now some crazy characters like those old
> latin letters are supported (e.g. LATIN CAPITAL LIGATURE IJ), and the result
> for equal and equal ignore case are as expected.
> And right now those methods do not work with surrogate characters (e.g.
> GEORGIAN LETTER AN and GEORGIAN CAPITAL LETTER AN are false for equals and
> equals ignore case). But this is an edge case that we can ignore for now I
> believe.
I’ve tested it locally and all units tests are fine. So I’ll PR when I have
time.
> CheersBruno
>
> On Sunday, 3 March 2019, 1:29:12 pm NZDT, Alex Herbert
> <alex.d.herb...@gmail.com> wrote:
>
> Having looked a bit more at StringUtils it appears that:
>
> public static boolean equals(final CharSequence cs1, final CharSequence cs2);
> public static boolean equalsIgnoreCase(final CharSequence str1, final
> CharSequence str2);
>
> share edge case logic checking but it is implemented differently (although
> with the same effect). They also have inconsistent parameter names (cs1 vs
> str1).
>
> The equals method then ends up calling CharSequenceUtils.regionMatches but
> without the ignore case option. This method does a lot more work than
> necessary as it performs a lot of edge case checks then loops with regard to
> checking for case. It would be cleaner to mimic String.equals by using a
> stepwise charAt comparison.
>
> Any objections to a PR to fix:
>
> - The parameter name inconsistency
> - The edge case logic inconsistency
> - Change to a stepwise charAt comparison
>
> Alex
>
>> On 2 Mar 2019, at 20:49, Alex Herbert <alex.d.herb...@gmail.com> wrote:
>>
>>
>>> On 2 Mar 2019, at 16:59, Mark Dacek <m...@syberion.com
>>> <mailto:m...@syberion.com>> wrote:
>>>
>>> Is your proposed method a stepwise charAt comparison across both, assuming
>>> non-null and equal length?
>>
>> Yes. Although the StringUtils.equals(CharSequence, CharSequence) from [lang]
>> will do the job correctly (thanks Gary). It currently does all the edge case
>> checks then calls a region matching method using the entire length but the
>> effect is the same as:
>>
>> for (int i = 0; i < cs1.length(); i++) {
>> if (cs1.charAt(i) != cs2.charAt(i)) {
>> return false;
>> }
>> }
>> return true;
>>
>> Switching in the above code instead of the call to regionMatches(…) at the
>> end of StringUtils.equals(CharSequence, CharSequence) would avoid repeating
>> all the edge case checks of length at the start of that method and the case
>> insensitivity functionality.
>>
>> The StringUtils.equals method already detects if String is input as both
>> arguments and defaults to that if possible. So this is basically for any
>> other combination of CharSequence types where a simple stepwise charAt
>> comparison is wanted.
>>
>>
>>> Doesn't seem like a bad idea, though I'm curious whether there's a use-case
>>> where toString() on both and comparing isn't more expedient.
>>
>> Just the memory overhead of duplicating to create a String. If a match is
>> unlikely, especially near the start, then this is a cost to consider for
>> longer strings.
>>
>> I was just after something to put in place of the incorrect usage of:
>>
>> CharSequence cs1, cs2;
>> cs1.equals(cs2);
>>
>> Which is not part of the CharSequence interface and works only if inputting
>> 2 objects that support equals correctly, like String or StringBuilder.
>>
>> I’ve just has a look for .equals() in all of [text] and this is actually a
>> bug that is in the newly submitted JaroWinklerSimilarity too.
>>
>> I’ll do a PR to fix that one.
>>
>>>
>>> On Sat, Mar 2, 2019 at 11:53 AM Alex Herbert <alex.d.herb...@gmail.com
>>> <mailto:alex.d.herb...@gmail.com>>
>>> wrote:
>>>
>>>> I am helping with the PR for TEXT-126 to add to the similarity package.
>>>>
>>>> Part of the new algorithm requires identifying if two CharSequences are
>>>> identical. Is there a utility in Text to do something like this:
>>>>
>>>> public static boolean CharSequenceUtils.equals(CharSequence, CharSequence);
>>>>
>>>> I cannot find one with a quick regex search of the library. I am not
>>>> familiar with Lang either but this is a dependency so a method from there
>>>> could be used.
>>>>
>>>> The current PR is using left.equals(right) on the input CharSequence to
>>>> compare to one to another which is wrong if the two input CharSequences do
>>>> not support matching, e.g. if the input was a String and StringBuilder then
>>>> String.equals(StringBuilder) would not match, even if the characters were
>>>> the same.
>>>>
>>>> Regards,
>>>>
>>>> Alex
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>> <mailto:dev-unsubscr...@commons.apache.org>
>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>> <mailto:dev-h...@commons.apache.org>
>>>>
>>>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org