Re: Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Kurchi Hazra Mon, 14 Jan 2013 12:05:43 -0800

Thank you all for your comments. Here is an updated webrev whereURI.hashCode() calculates the hashCode itself

instead of relying on String.hashCode():

http://cr.openjdk.java.net/~khazra/7171415/webrev.01/


Thanks,
Kurchi

On 08.01.2013 20:29, Vitaly Davidovich wrote:

Yes, I also thought about range checks not being eliminated if usingcharAt() but I guess that depends on how smart the JIT is - if charAtis inlined there's technically enough info there for the compiler tosee that range checks aren't needed. Whether that happens or not Ihaven't checked. toCharArray brings us back to having allocationsunless, again, EA helps out. I think a microbenchmark would help here(along with verbose GC logging) to see which is better if this is aconcern.

Why do you say you need to duplicate String.hashCode to be consistentwith what people are using already? As long as the hash quality is atleast as good as today (or not significantly worse) shouldn't you beable to change the impl? If someone's relying on specific value forsome input then their code is broken. Besides, doing toUpper willchange the hash for URIs with % anyway. Perhaps I misunderstood yourpoint though ...


Vitaly

Sent from my phone

On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra"<kurchi.subhra.ha...@oracle.com<mailto:kurchi.subhra.ha...@oracle.com>> wrote:


    On 1/8/13 6:55 PM, Vitaly Davidovich wrote:


    Also, I'm not sure how hot this method is in practice but
    allocating StringBuilder seems a bit heavy (maybe it gets
    partially escape analyzed out though).  Can you instead loop over
    all the chars in the string and build up the hash code as you go
    along? If you see a % then you handle next 2 chars specially,
    like you do now.  Or are you trying to take advantage of String
    intrinsic support in the JIT? I guess if perf is a concern you
    can write a micro benchmark comparing the approaches ...

    That did occur to me, but I guess we have to be consistent with
    the value that people have already been using, and that means I have
    to duplicate the code in String.hashCode() (that is what the
    original implementation was calling) - I was trying to avoid that.
    Also,
    String.hashCode() uses its internal char[] - whereas charAt() will
    involve several additional bound checks - but
    using toCharArray() may be better. Let me take another look at
    this, and get back with another webrev.

    Sent from my phone

    On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <vita...@gmail.com
    <mailto:vita...@gmail.com>> wrote:

        Hi Kurchi,

        In the hash method, I suggest you move handling of strings
        with % into a separate method to keep the hash method small
        for common case (no %).  Otherwise, there's a chance this
        method won't get inlined anymore due to its (newly increased)
        size.

    - Yep, will do.

        Also, I realize toLower does the same thing, but why does
        toUpper return an int whereas it's really a char? Might be
        cleaner to declare return type as char and do the cast inside
        the method as needed.

    - I followed the format of toLower(). But I agree this way it will
    be cleaner.

    Thanks a lot,
    Kurchi

        Thanks

        Sent from my phone

        On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
        <kurchi.subhra.ha...@oracle.com
        <mailto:kurchi.subhra.ha...@oracle.com>> wrote:

            Hi,

                According to RFC 3986[1], hexadecimal digits encoded
            by a '%' should be case-insensitive, for example,%A2 and
            %a2 should be
            considered equal. Although, URI.equals() does take this
            into consideration, the implementation of URI.hashCode()
            does not and
            returns different hashcodes for two URIs that are similar
            in all respects except for the case of the
            percent-encoded hexadecimal digits.
            This fix attempts to construct a normalized string from
            the string representing a component before calculating
            its hashCode.
            I converted to upper case for the normalization(and not
            lower case) as required by [1].

                For testing the fix, I added an additional test
            scenario to an existing test
            (jdk/test/java/net/URI/Test.java). While I was there, I
            also made
            minor changes to the test so that it does not produce
            rawtype and other lint warnings.

            Bug:
            http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
            Webrev:
            http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
            <http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>

            URI.compareTo() still suffers from the same problem - I
            am not sure if it should be dealt with as a separate bug.

            Thanks,
            Kurchi

            [1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1


--
-Kurchi

Re: Code Review Request: 7171415: java.net.URI.equals/hashCode not consistent for some URIs

Reply via email to