Re: Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Chris Hegarty Tue, 15 Jan 2013 08:54:26 -0800

Looks good to me too. Thanks Kurchi.

-Chris.


On 01/15/2013 01:25 PM, Vitaly Davidovich wrote:

Looks good.  You can move ch, i, and index declarations into the for
loop scope, but it's fine as-is.

Thanks

Sent from my phone

On Jan 14, 2013 3:05 PM, "Kurchi Hazra" <kurchi.subhra.ha...@oracle.com
<mailto:kurchi.subhra.ha...@oracle.com>> wrote:

    Thank you all for your comments. Here is an updated webrev where
    URI.hashCode() calculates the hashCode itself
    instead of relying on String.hashCode():

    http://cr.openjdk.java.net/~khazra/7171415/webrev.01/

    Thanks,
    Kurchi

    On 08.01.2013 20:29, Vitaly Davidovich wrote:


    Yes, I also thought about range checks not being eliminated if
    using charAt() but I guess that depends on how smart the JIT is -
    if charAt is inlined there's technically enough info there for the
    compiler to see that range checks aren't needed.  Whether that
    happens or not I haven't checked. toCharArray brings us back to
    having allocations unless, again, EA helps out.  I think a
    microbenchmark would help here (along with verbose GC logging) to
    see which is better if this is a concern.

    Why do you say you need to duplicate String.hashCode to be
    consistent with what people are using already? As long as the hash
    quality is at least as good as today (or not significantly worse)
    shouldn't you be able to change the impl? If someone's relying on
    specific value for some input then their code is broken.  Besides,
    doing toUpper will change the hash for URIs with % anyway.
    Perhaps I misunderstood your point though ...

    Vitaly

    Sent from my phone

    On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra"
    <kurchi.subhra.ha...@oracle.com
    <mailto:kurchi.subhra.ha...@oracle.com>> wrote:

        On 1/8/13 6:55 PM, Vitaly Davidovich wrote:


        Also, I'm not sure how hot this method is in practice but
        allocating StringBuilder seems a bit heavy (maybe it gets
        partially escape analyzed out though). Can you instead loop
        over all the chars in the string and build up the hash code
        as you go along? If you see a % then you handle next 2 chars
        specially, like you do now.  Or are you trying to take
        advantage of String intrinsic support in the JIT? I guess if
        perf is a concern you can write a micro benchmark comparing
        the approaches ...

        That did occur to me, but I guess we have to be consistent
        with the value that people have already been using, and that
        means I have
        to duplicate the code in String.hashCode() (that is what the
        original implementation was calling) - I was trying to avoid
        that. Also,
        String.hashCode() uses its internal char[] - whereas charAt()
        will involve several additional bound checks - but
        using toCharArray() may be better. Let me take another look at
        this, and get back with another webrev.

        Sent from my phone

        On Jan 8, 2013 9:45 PM, "Vitaly Davidovich"
        <vita...@gmail.com <mailto:vita...@gmail.com>> wrote:

            Hi Kurchi,

            In the hash method, I suggest you move handling of
            strings with % into a separate method to keep the hash
            method small for common case (no %). Otherwise, there's a
            chance this method won't get inlined anymore due to its
            (newly increased) size.

        - Yep, will do.

            Also, I realize toLower does the same thing, but why does
            toUpper return an int whereas it's really a char? Might
            be cleaner to declare return type as char and do the cast
            inside the method as needed.

        - I followed the format of toLower(). But I agree this way it
        will be cleaner.

        Thanks a lot,
        Kurchi

            Thanks

            Sent from my phone

            On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
            <kurchi.subhra.ha...@oracle.com
            <mailto:kurchi.subhra.ha...@oracle.com>> wrote:

                Hi,

                    According to RFC 3986[1], hexadecimal digits
                encoded by a '%' should be case-insensitive, for
                example,%A2 and %a2 should be
                considered equal. Although, URI.equals() does take
                this into consideration, the implementation of
                URI.hashCode() does not and
                returns different hashcodes for two URIs that are
                similar in all respects except for the case of the
                percent-encoded hexadecimal digits.
                This fix attempts to construct a normalized string
                from the string representing a component before
                calculating its hashCode.
                I converted to upper case for the normalization(and
                not lower case) as required by [1].

                    For testing the fix, I added an additional test
                scenario to an existing test
                (jdk/test/java/net/URI/Test.java). While I was there,
                I also made
                minor changes to the test so that it does not produce
                rawtype and other lint warnings.

                Bug:
                http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
                Webrev:
                http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
                <http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>

                URI.compareTo() still suffers from the same problem -
                I am not sure if it should be dealt with as a
                separate bug.

                Thanks,
                Kurchi

                [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1


    --
    -Kurchi

Re: Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Reply via email to