yihua commented on code in PR #18408:
URL: https://github.com/apache/hudi/pull/18408#discussion_r3034719761
##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java:
##########
@@ -320,12 +321,14 @@ private InstantsLoader(boolean loadInstantDetails) {
public void accept(String instantTime, GenericRecord record) {
Option<HoodieInstant> instant = readCommit(instantTime, record,
loadInstantDetails, null);
if (instant.isPresent()) {
Review Comment:
🤖 The `HashSet` returned by `computeIfAbsent` isn't thread-safe, but
`accept()` is invoked from a parallel stream in `ArchivedTimelineLoaderV1`
(line 136). Two threads processing instants with the same `requestedTime` can
concurrently mutate the same `HashSet`, which risks corrupting its internal
hash table (potentially causing infinite loops on resize). The prior
`ArrayList` had the same issue, but the switch to `HashSet` makes it more acute
since this PR's scenario (duplicate instants from merged files) increases the
chance of same-key collisions. Could you use `Collections.newSetFromMap(new
ConcurrentHashMap<>())` or `ConcurrentHashMap.newKeySet()` as the per-key set?
##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/versioning/v1/ArchivedTimelineV1.java:
##########
@@ -320,12 +321,14 @@ private InstantsLoader(boolean loadInstantDetails) {
public void accept(String instantTime, GenericRecord record) {
Option<HoodieInstant> instant = readCommit(instantTime, record,
loadInstantDetails, null);
if (instant.isPresent()) {
- instantsInRange.computeIfAbsent(instant.get().requestedTime(), s ->
new ArrayList<>())
- .add(instant.get());
+ Set<HoodieInstant> instantsForTime =
instantsInRange.computeIfAbsent(instant.get().requestedTime(), s -> new
HashSet<>());
Review Comment:
🤖 nit: `Set.add()` is already a no-op when the element is present — the
`contains` guard is redundant. You can simplify to just
`instantsForTime.add(instant.get());`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]