On Mon, 21 Nov 2022 16:46:43 GMT, Strahinja Stanojevic <d...@openjdk.org> wrote:

>> This PR introduces an option to output stable names for the lambda classes 
>> in the JDK. A stable name consists of two parts: The first part is the 
>> predefined value `$$Lambda$` appended to the lambda capturing class, and the 
>> second is a 64-bit hash part of the name. Thus, it looks like 
>> `lambdaCapturingClass$$Lambda$hashValue`.
>> Parameters used to create a stable hash are a superset of the parameters 
>> used for lambda class archiving when the CDS dumping option is enabled. 
>> During this process, all the mutual parameters are in the same form as they 
>> are in the low-level implementation 
>> (`SystemDictionaryShared::add_lambda_proxy_class`) of the archiving process.
>> We decided to use a well-specified `CRC32` algorithm from the standard Java 
>> library. We created two 32-bit hashes from the parameters used to create 
>> stable names. Then, we combined those two 32-bit hashes into one 64-bit hash 
>> value.
>> We chose `CRC32` because it is a well-specified hash function, and we don't 
>> need to write additional code in the JDK. `SHA-256, MD5`, and all other hash 
>> functions that rely on `MessageDigest` use lambdas in the implementation, so 
>> they are unsuitable for our purpose. We also considered a few different hash 
>> functions with a low collision rate. All these functions would require at 
>> least 100 lines of additional code in the JDK. The best alternative we found 
>> is 64-bit` MurmurHash2`: 
>> https://commons.apache.org/proper/commons-codec/jacoco/org.apache.commons.codec.digest/MurmurHash2.java.html.
>>   In case adding a new hash implementation (e.g., Murmur2) to the JDK is 
>> preferred, this PR can be easily modified.
>> We found the post 
>> (https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633)
>>  that compares different hash functions.
>> We also tested the `CRC32` hash function against half a billion generated 
>> strings, and there were no collisions. Note that the capturing-class name is 
>> also part of the lambda class name, so the potential collisions can only 
>> appear in a single class. Thus, we do not expect to have name collisions due 
>> to a relatively low number of lambdas per class. Every tool that uses this 
>> feature should handle potential collisions on its own.  
>> We found an overall approximation of the collision rate too. You can find it 
>> here: https://preshing.com/20110504/hash-collision-probabilities/.
>> 
>> JDK currently adds an atomic integer after `$$Lambda$`, and the names of the 
>> lambdas depend on the creation order. In the `TestStableLambdaNames`, we 
>> generate all the lambdas two times. In the first run, the method 
>> createPlainLambdas generate the following lambdas:
>> 
>> - TestStableLambdaNames$$Lambda$1/0x0000000800c00400
>> - TestStableLambdaNames$$Lambda$2/0x0000000800c01800
>> - TestStableLambdaNames$$Lambda$3/0x0000000800c01a38
>> The same method in the second run generates lambdas with different names:
>> - TestStableLambdaNames$$Lambda$1471/0x0000000800d10000
>> - TestStableLambdaNames$$Lambda$1472/0x0000000800d10238
>> - TestStableLambdaNames$$Lambda$1473/0x0000000800d10470
>> 
>> If we use the introduced flag, generated lambdas are:
>> - TestStableLambdaNames$$Lambda$65ba26bbc6c7500d/0x0000000800c00400
>> - TestStableLambdaNames$$Lambda$1569c8c4abe3ab18/0x0000000800c01800
>> - TestStableLambdaNames$$Lambda$493c0ecaaf682428/0x0000000800c01a38
>> In the second run of the method, generated lambdas are:
>> - TestStableLambdaNames$$Lambda$65ba26bbc6c7500d/0x0000000800d10000
>> - TestStableLambdaNames$$Lambda$1569c8c4abe3ab18/0x0000000800d10238
>> - TestStableLambdaNames$$Lambda$493c0ecaaf682428/0x0000000800d10470
>> 
>> We can see that the introduced hash value does not change between two calls 
>> of the method `createPlainLambdas`. That was not the case in the JDK run 
>> without this change. Those lambdas are extracted directly from the test.
>
> Strahinja Stanojevic has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove address from lambda class names in test on the 32-bit architecture 
> too

What about other cases like the JLI `MH`, `DMH`, `InjectedInvoker`, lambda form 
etc. classes? Or the FFI specialized upcall proxies, or the many cases where 
users define hidden classes?

Wouldn't it be better to make the suffix of a hidden class be repeatable, thus 
solving all of these cases at once, rather than chasing down every place where 
a hidden class is defined?

-------------

PR: https://git.openjdk.org/jdk/pull/10024

Reply via email to