On Wed, 24 May 2023 19:36:44 GMT, Aleksey Shipilev <sh...@openjdk.org> wrote:
> UUID is the very important class that is used to track identities of objects > in large scale systems. On some of our systems, `UUID.randomUUID` takes >1% > of total CPU time, and is frequently a scalability bottleneck due to > `SecureRandom` synchronization. > > The major issue with UUID code itself is that it reads from the single > `SecureRandom` instance by 16 bytes. So the heavily contended `SecureRandom` > is bashed with very small requests. This also has a chilling effect on other > users of `SecureRandom`, when there is a heavy UUID generation traffic. > > We can improve this by doing the bulk reads from the backing SecureRandom and > possibly striping the reads across many instances of it. > > > Benchmark Mode Cnt Score Error Units > > ### AArch64 (m6g.4xlarge, Graviton, 16 cores) > > # Before > UUIDRandomBench.single thrpt 15 3.545 ± 0.058 ops/us > UUIDRandomBench.max thrpt 15 1.832 ± 0.059 ops/us ; negative scaling > > # After > UUIDRandomBench.single thrpt 15 4.421 ± 0.047 ops/us > UUIDRandomBench.max thrpt 15 6.658 ± 0.092 ops/us ; positive scaling, > ~1.5x > > ### x86_64 (c6.8xlarge, Xeon, 18 cores) > > # Before > UUIDRandomBench.single thrpt 15 2.710 ± 0.038 ops/us > UUIDRandomBench.max thrpt 15 1.880 ± 0.029 ops/us ; negative scaling > > # After > Benchmark Mode Cnt Score Error Units > UUIDRandomBench.single thrpt 15 3.099 ± 0.022 ops/us > UUIDRandomBench.max thrpt 15 3.555 ± 0.062 ops/us ; positive > scaling, ~1.2x > > > Note that there is still a scalability bottleneck in current default random > (`NativePRNG`), because it synchronizes over a singleton instance for SHA1 > mixer, then the engine itself, etc. -- it is quite a whack-a-mole to figure > out the synchronization story there. The scalability fix in current default > `SecureRandom` would be much more intrusive and risky, since it would change > a core crypto class with unknown bug fanout. > > Using the bulk reads even when the underlying PRNG is heavily synchronized is > still a win. A more scalable PRNG would benefit from this as well. This PR > adds a system property to select the PRNG implementation, and there we can > clearly see the benefit with more scalable PRNG sources: > > > Benchmark Mode Cnt Score Error Units > > ### x86_64 (c6.8xlarge, Xeon, 18 cores) > > # Before, hacked `new SecureRandom()` to > `SecureRandom.getInstance("SHA1PRNG")` > UUIDRandomBench.single thrpt 15 3.661 ± 0.008 ops/us > UUIDRandomBench... src/java.base/share/classes/java/util/UUID.java line 224: > 222: // Try to pull the UUID from the current buffer at > current position. > 223: if (stamp != 0) { > 224: int p = (int)VH_POS.getAndAdd(this, UUID_CHUNK); An atomic update together with an optimistic lock looks non-idiomatic use of StampedLock to me. Won't a simple CAS loop be simpler? E.g. in pseudocode: while ((p = this.pos) + 16) < buf.length) { long msb = getLong(buf, p); long lsb = getLong(buf, p + 8); if (cas(this.pos, p, p + 16)) { return new UUID(msb, lsb); } } // refill buffer under lock src/java.base/share/classes/java/util/UUID.java line 226: > 224: int p = (int)VH_POS.getAndAdd(this, UUID_CHUNK); > 225: if (p < BUF_SIZE) { > 226: uuid = new UUID(buf, p); We can read msb/lsb from the buffer here and move object allocation outside the lock to reduce the length of the critical section. src/java.base/share/classes/java/util/UUID.java line 260: > 258: buf[c + 8] &= 0x3f; /* clear variant */ > 259: buf[c + 8] |= (byte) 0x80; /* set to IETF > variant */ > 260: } I'm not sure I understand the point about initialization. Why not just setting the required version bits right in UUID constructor without updating the array? This will be faster and simpler IMO. src/java.base/share/classes/java/util/UUID.java line 286: > 284: long lsb = 0; > 285: for (int i = start; i < start + 8; i++) { > 286: msb = (msb << 8) | (data[i] & 0xff); Can we use VarHandle/ByteBuffer to read 64 bits at a time? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206104940 PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206105877 PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206096052 PR Review Comment: https://git.openjdk.org/jdk/pull/14135#discussion_r1206097261