On Sun, 25 May 2025 07:18:59 GMT, John R Rose <jr...@openjdk.org> wrote:
>> src/hotspot/share/oops/fieldInfo.cpp line 252: >> >>> 250: int FieldInfoReader::search_table_lookup(const Array<u1> >>> *search_table, const Symbol *name, const Symbol *signature, ConstantPool >>> *cp, int java_fields) { >>> 251: UNSIGNED5::Reader<const u1*, int> r2(_r.array()); >>> 252: int low = 0, high = java_fields - 1; >> >> This is probably correct, but I recommend a few changes: >> A. add an assert `low <= high` at the beginning (defend against surprise >> `java_fields` number). >> B. repeat the assert after each new calculation of high or low. >> C. declare high and low and mid as `uint32_t` to prove overflow (UB) >> impossible without the need to reason about integral range > > I looked over the code base for other occurrences of binary search, and I see > your code is not an outlier. > > The tendency is to use int for high/mid/low, and there are cases where UB > would be possible in the absence of extra knowledge about the possible ranges > of the ints. `InstanceKlass::quick_search` is especially scary, until one > realizes that method lists are probably on the order of 2^16. > `vmSymbols::find_sid` is the same story; the number of symbols is much less > than 2^30.. > > `GrowableArray::find_sorted` also uses int indexes, but uses a `uint` cast at > just the right place when computing mid, to avoid UB, so can handle arrays > sized near 2^31. > > With those precedents, I'm not going to pick on this code here. The > recommended changes A/B/C are optional. > > But I am going to think about what it would take to reduce the number of > occurences of binary switch, and build a template that can re-implement at > least some of them (probably including yours) with bullet-proof, unit-tested > code. > > Binary search over 3/4/5-byte packed records is easy, which is what you are > doing here. The use of shortened integers is clever. I think that can be > made a template also, and then used for more than one index, as I've observed > before. > > One good way to think about these variable-sized packed records is to treat > them as physical `uint64_t` values, with high order zero bytes removed. If > you have a word or so of slop area at the bottom of the packed array, you can > load 1..8 bytes in a single (misaligned) memory operation, loading garbage > into unused bytes, and then using shift or mask to clear the garbage. That > may be faster than asking C++ to do a bunch of branchy logic and byte > assembly on every access. Loading an N-byte word is simply > `w=*(uint64_t*)(p+N-8)>>(N*8)`. Your packed structs have two fields x,y, but > they can be reimagined as bitfields `x=w&((1<<xlen)-1)`, `y=w>>xlen`. This > would save half a byte per entry in your tables, on average, since you can > often share a byte between the two fields. And the decode logic would be > simpler than the closure-based stuff you have now. > > These thoughts are just another observation about the design space you are > working in, not a request for changes, although I do think they would > simplify your code (and make it more efficient). This sort of careful design > makes the most sense when packaged in a separate class or template, that can > be carefully reviewed and unit-tested in isolation. If we were to refactor > some old searchable... I like the idea of mapping each element in the table as raw bits, though handling of access to the end of the array would be a bit inconvenient (or we would have to allocate a few extra bytes). I've changed the algorithm to use unsigned integers; in fact I find a bit annoying that most of the indices used throughout the related code are signed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24847#discussion_r2109021670