On Tue, 21 Feb 2023 20:33:41 GMT, Eirik Bjorsnos <d...@openjdk.org> wrote:
>> Hi Alan, >> >> I thought I was clever by encoding the 'uppercaseness' in the variable name, >> but yeah I'll find a better name :) >> >> There is some precedent for using the 'ASCII trick' comment in the JDK. I >> found it in ZipFile.isMetaName, which is also where I first learned about >> this interesting relationship between ASCII (and also latin1) letters. >> >> The comment was first added by Martin Buchholz back in 2016 as part of >> JDK-8157069, 'Assorted ZipFile improvements'. In 2020, Claes was updating >> this code and Lance has some input about clarifying the comment. Martin then >> [chimed >> in](https://mail.openjdk.org/pipermail/core-libs-dev/2020-May/066363.html) >> to defend his comment: >> >>> I still like my ancient "ASCII trick" comment. >> >> I think this 'trick', whatever we call it, is sufficiently intricate that it >> deserves to be called out somehow and that we should not just casually >> bitmask with these magic constants without any discussion at all. >> >> An earlier iteration of this PR included a small essay in the javadoc of >> this method describing the layout and relationship of letters in latin1 and >> how we can apply that knowledge of the layout to implement the method. >> >> How would you feel about adding that description back to the Javadocs? This >> would then live close to the similarly implemented toUpperCase and >> toLowerCase methods currently under review in #12623. >> >> Here's the updated discussion included in the Javadoc: >> >> >> /** >> * Compares two latin1 code points, ignoring case considerations. >> * >> * Implementation note: In ISO/IEC 8859-1, the uppercase and lowercase >> * letters are found in the following code point ranges: >> * >> * 0x41-0x5A: Uppercase ASCII letters: A-Z >> * 0x61-0x7A: Lowercase ASCII letters: a-z >> * 0xC0-0xD6: Uppercase latin1 letters: A-GRAVE - O with Diaeresis >> * 0xD8-0xDE: Uppercase latin1 letters: O with slash - Thorn >> * 0xE0-0xF6: Lowercase latin1 letters: a-grave - o with Diaeresis >> * 0xF8-0xFE: Lowercase latin1 letters: o with slash - thorn >> * >> * While both ASCII letter ranges are contiguous, the latin1 ranges are >> not: >> * >> * The 'multiplication sign' 0xD7 splits the uppercase range in two. >> * The 'division sign' 0xF7 splits the lowercase range in two. >> * >> * Lowercase letters are found 32 positions (0x20) after their >> corresponding uppercase letter. >> * The 'division sign' and 'multiplication sign' have the same relative >> distance. >> * >> * Since 0x20 is a single bit, we can apply the 'oldest ASCII trick in >> the book' to >> * lowercase any letter by setting the bit: >> * >> * ('C' | 0x20) == 'c' >> * >> * By removing the bit, we can perform the uppercase operation: >> * >> * ('c' & 0xDF) == 'C' >> * >> * Applying this knowledge of the latin1 layout, we can test for >> equality ignoring case by >> * checking that the code points are either equal, or that one of the >> code points is a letter >> * which uppercases is the same as the uppercase of the other code point. >> * >> * @param b1 byte representing a latin1 code point >> * @param b2 another byte representing a latin1 code point >> * @return true if the two bytes are considered equals ignoring case in >> latin1 >> */ >> static boolean equalsIgnoreCase(byte b1, byte b2) { >> if (b1 == b2) { >> return true; >> } >> int upper = b1 & 0xDF; >> if (upper < 'A') { >> return false; // Low ASCII >> } >> return (upper <= 'Z' // In range A-Z >> || (upper >= 0xC0 && upper <= 0XDE && upper != 0xD7)) // >> ..or A-grave-Thorn, excl. multiplication >> && upper == (b2 & 0xDF); // b2 has same uppercase >> } > > Perhaps @Martin-Buchholz could chime in and also tell us which book he found > his ASCII trick in :) "oldest trick in the book" is a phrase that does not necessarily imply existence of an actual book! Let this evoke an image of a **personal** book of tricks that programmers in the 1960s might have recorded such techniques in. And the tricks were passed down across generations of programmers! ------------- PR: https://git.openjdk.org/jdk/pull/12632