Refactoring `ImageReader` to make it easy to add preview mode functionality for 
Valhalla.

This PR is a large change to `ImageReader` (effectively a rewrite) but reduces 
the surface area of the API significantly, reduces code complexity and 
increases performance/memory efficiency. The need for this sort of change comes 
from the need to support "preview mode" in Valhalla for classes loaded from 
core modules.

### Preview Mode

In the proposed preview mode support for Valhalla, a resource with the name 
`/modules/<module>/<path>` may come from one of two locations in the underlying 
jimage file; `/<module>/<path>` or `/<module>/META-INF/preview/<path>`. 
Furthermore, directories (represented as directory nodes via the API) will have 
a potentially different number of child nodes depending on whether preview mode 
is in effect, and some directories may only exist at all in preview mode.

Furthermore, the directories and symbolic link nodes in `/packages/...` will 
also be affected by the existence of new package directories. To retain 
consistency and avoid "surprises" later, all of this needs to be taken into 
account.

Below is a summary of the main changes for mainline JDK to better support 
preview mode later:

### 1: Improved encapsulation for preview mode

The current `ImageReader` code exposes the data from the jimage file in several 
ways; via the `Node` abstraction, but also with methods which return an 
`ImageLocation` instance directly. In preview mode it would not be acceptable 
to return the `ImageLocation`, since that would leak the existence of resources 
in the `META-INF/preview` namespace and lets users see resource nodes with 
names that don't match the underlying `ImageLocation` from which their contents 
come.

As such, the PR removes all methods which can leak `ImageLocation` or other 
"underlying" semantics about the jimage file. Luckily most of these are either 
used minimally and easily migrated to using nodes, or they were not being used 
at all. This PR also removes any methods which were already unused across the 
OpenJDK codebase (if I discover any issues with over-trimming of the API during 
full CI testing, it will be easy to address).

### 2. Simplification of directory child node handling

The current `ImageReader` code attempts to create parent directories "on 
demand" for any child nodes it creates. This results in parent directories 
having a non-empty but incomplete set of child nodes present. This is referred 
to as the directory being "incomplete" and requires users of the API to 
understand this and potentially "complete" these nodes by calling back into the 
`ImageReader` API.

This approach is an attempt to provide lazy node creation (which is a necessary 
trait since there are ~31,000 nodes in a typical jimage hierarchy) but isn't 
actually necessary since a child node has no "get parent" functionality and 
simply not creating parent nodes at the point is functionally equivalent.

The partially complete directory nodes also pose an issue for preview mode 
support, making it more complex to reason about where and when preview child 
nodes should be applied.

This PR simplifies directory node handling in two ways:
1. Directory nodes are either empty (incomplete) or have all child nodes 
present.
2. Directory nodes no longer have an API to return their (possibly incomplete) 
child nodes to users.

This results in simpler to reason about code for directory management and 
simpler changes later to support preview mode. Removing the on-demand parent 
directory creation and the partial child list management also has a noticeable 
performance improvement (based on the new `ImageReaderBenchmark` in pr/26044).

### 3. More comments and code assertions

Since `ImageReader` is a critical bit of internal tooling for OpenJDK, I made 
sure to add significant comments explaining the behaviour, as well as adding 
many new in-code assert statements. The version of `ImageReader` in this PR is 
70 lines shorter than the current version, but if you account for new comments, 
it's really a reduction of almost 40% (640 -> 400 lines of code) and has over 
200 more comment lines.

### 4. New tests

I added a new unit test for `ImageReader` since, until now, its API was not 
really being tested directly. These tests are useful now, but will really help 
when preview mode is added in Valhalla, since then there will need to be 
significant edge-case testing. I'm happy to improve or change tests in this PR, 
but they definitely cover the main cases.

### 5. Adding TODO comments for review discussion.

There are obviously some open questions about the exact design of the APIs, and 
some questions around behaviour. To make reviewing easier, I've added inline 
TODOs which are there to illicit feedback. I will account for an remove all of 
these before the PR is integrated, but I want reviewers to read and comment on 
them (even if there is no change expected).

### Performance Improvements

On my laptop (not objectively interesting, but good for comparison) I am seeing 
a significant performance improvement over all benchmarks and a reduction in 
timing variability. These benchmarks are awkward because of the need to test 
things in a "cold start" state, so timings will have larger variability than 
expected, but the performance improvements are consistent and non trivial:

Run with `make test TEST="micro:jdk.internal.jrtfs.ImageReaderBenchmark"` using 
the benchmark currently being added in `pr/26044`:

Current version:

Benchmark                                            Mode  Cnt   Score    Error 
 Units
ImageReaderBenchmark.warmCache_CountAllNodes         avgt    5   0.784 ±  0.023 
 ms/op
ImageReaderBenchmark.coldStart_CountOnly               ss    5  37.910 ± 28.554 
 ms/op
ImageReaderBenchmark.coldStart_InitAndCount            ss    5  37.471 ± 18.775 
 ms/op
ImageReaderBenchmark.coldStart_LoadJavacInitClasses    ss    5   4.279 ±  7.959 
 ms/op


This PR:

Benchmark                                            Mode  Cnt   Score   Error  
Units
ImageReaderBenchmark.warmCache_CountAllNodes         avgt   10   0.880 ± 0.086  
ms/op
ImageReaderBenchmark.coldStart_CountOnly               ss   10  14.303 ± 3.975  
ms/op
ImageReaderBenchmark.coldStart_InitAndCount            ss   10  14.032 ± 3.068  
ms/op
ImageReaderBenchmark.coldStart_LoadJavacInitClasses    ss   10   2.530 ± 0.348  
ms/op


Shows a >2.5x speedup for traversing all nodes in a "cold start" state, and 
~1.7x speedup for loading the core set of classes needed to start javac.

-------------

Depends on: https://git.openjdk.org/jdk/pull/26044

Commit messages:
 - Integrating with ImageReaderBenchmark.
 - Merge branch 'master' into jdk_8355953_reader/squashed
 - undo dependent changes
 - syncing to main branch
 - Newline at EOF.
 - Pretty much ready for review now.
 - WIP - more tests needed.
 - Reapply diffs.

Changes: https://git.openjdk.org/jdk/pull/26054/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26054&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8360037
  Stats: 2217 lines in 11 files changed: 1613 ins; 342 del; 262 mod
  Patch: https://git.openjdk.org/jdk/pull/26054.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26054/head:pull/26054

PR: https://git.openjdk.org/jdk/pull/26054

Reply via email to