Yes, I also thought about range checks not being eliminated if using
charAt() but I guess that depends on how smart the JIT is - if charAt is
inlined there's technically enough info there for the compiler to see
that range checks aren't needed. Whether that happens or not I haven't
checked. toCharArray brings us back to having allocations unless,
again, EA helps out. I think a microbenchmark would help here (along
with verbose GC logging) to see which is better if this is a concern.
Why do you say you need to duplicate String.hashCode to be consistent
with what people are using already? As long as the hash quality is at
least as good as today (or not significantly worse) shouldn't you be
able to change the impl? If someone's relying on specific value for some
input then their code is broken. Besides, doing toUpper will change the
hash for URIs with % anyway. Perhaps I misunderstood your point though ...
Vitaly
Sent from my phone
On Jan 8, 2013 11:04 PM, "Kurchi Subhra Hazra"
<kurchi.subhra.ha...@oracle.com <mailto:kurchi.subhra.ha...@oracle.com>>
wrote:
On 1/8/13 6:55 PM, Vitaly Davidovich wrote:
Also, I'm not sure how hot this method is in practice but
allocating StringBuilder seems a bit heavy (maybe it gets
partially escape analyzed out though). Can you instead loop over
all the chars in the string and build up the hash code as you go
along? If you see a % then you handle next 2 chars specially, like
you do now. Or are you trying to take advantage of String
intrinsic support in the JIT? I guess if perf is a concern you can
write a micro benchmark comparing the approaches ...
That did occur to me, but I guess we have to be consistent with the
value that people have already been using, and that means I have
to duplicate the code in String.hashCode() (that is what the
original implementation was calling) - I was trying to avoid that. Also,
String.hashCode() uses its internal char[] - whereas charAt() will
involve several additional bound checks - but
using toCharArray() may be better. Let me take another look at this,
and get back with another webrev.
Sent from my phone
On Jan 8, 2013 9:45 PM, "Vitaly Davidovich" <vita...@gmail.com
<mailto:vita...@gmail.com>> wrote:
Hi Kurchi,
In the hash method, I suggest you move handling of strings
with % into a separate method to keep the hash method small
for common case (no %). Otherwise, there's a chance this
method won't get inlined anymore due to its (newly increased)
size.
- Yep, will do.
Also, I realize toLower does the same thing, but why does
toUpper return an int whereas it's really a char? Might be
cleaner to declare return type as char and do the cast inside
the method as needed.
- I followed the format of toLower(). But I agree this way it will
be cleaner.
Thanks a lot,
Kurchi
Thanks
Sent from my phone
On Jan 8, 2013 8:20 PM, "Kurchi Hazra"
<kurchi.subhra.ha...@oracle.com
<mailto:kurchi.subhra.ha...@oracle.com>> wrote:
Hi,
According to RFC 3986[1], hexadecimal digits encoded
by a '%' should be case-insensitive, for example,%A2 and
%a2 should be
considered equal. Although, URI.equals() does take this
into consideration, the implementation of URI.hashCode()
does not and
returns different hashcodes for two URIs that are similar
in all respects except for the case of the percent-encoded
hexadecimal digits.
This fix attempts to construct a normalized string from
the string representing a component before calculating its
hashCode.
I converted to upper case for the normalization(and not
lower case) as required by [1].
For testing the fix, I added an additional test
scenario to an existing test
(jdk/test/java/net/URI/Test.java). While I was there, I
also made
minor changes to the test so that it does not produce
rawtype and other lint warnings.
Bug:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7171415
Webrev:
http://cr.openjdk.java.net/~khazra/7171415/webrev.00/
<http://cr.openjdk.java.net/%7Ekhazra/7171415/webrev.00/>
URI.compareTo() still suffers from the same problem - I am
not sure if it should be dealt with as a separate bug.
Thanks,
Kurchi
[1] http://tools.ietf.org/html/rfc3986#section-6.2.2.1