hudi-agent commented on code in PR #18837:
URL: https://github.com/apache/hudi/pull/18837#discussion_r3346662065
##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiUtil.java:
##########
@@ -78,10 +92,14 @@
import static io.trino.plugin.hudi.HudiErrorCode.HUDI_SCHEMA_ERROR;
import static io.trino.plugin.hudi.HudiErrorCode.HUDI_UNSUPPORTED_FILE_FORMAT;
import static java.lang.Math.toIntExact;
-import static org.apache.hudi.common.model.HoodieRecord.HOODIE_META_COLUMNS;
+import static
org.apache.hudi.common.model.HoodieRecord.PARTITION_PATH_METADATA_FIELD;
+import static
org.apache.hudi.common.model.HoodieRecord.RECORD_KEY_METADATA_FIELD;
public final class HudiUtil
Review Comment:
🤖 nit: this shadows `HoodieRecord.HOODIE_META_COLUMNS` (which contains 5
fields including commit_time, commit_seqno, file_name) with a 2-field subset
under the same name — a reader who has seen the upstream constant will assume
the wrong contents. Could you rename to something like
`HUDI_REQUIRED_META_COLUMNS` to make the distinction explicit?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiPageSource.java:
##########
@@ -115,8 +132,14 @@ public long getMemoryUsage()
public void close()
throws IOException
{
- fileGroupReader.close();
- pageSource.close();
+ // Closer attempts every close and aggregates failures via
addSuppressed, rethrowing the
+ // first. Registration is LIFO, so registering in reverse gives the
original close order:
+ // recordIterator (wraps the readers) first, then fileGroupReader,
then pageSource.
+ try (Closer closer = Closer.create()) {
+ closer.register(pageSource::close);
+ closer.register(fileGroupReader::close);
+ closer.register(recordIterator::close);
Review Comment:
🤖 I think this `Closer` is invoking each underlying resource multiple times:
`recordIterator.close()` is `CloseableMappingIterator` →
`HoodieFileGroupReaderIterator.close()` → `fileGroupReader.close()` → closes
`baseFileIterator` (the iterator from
`HudiTrinoReaderContext.getFileRecordIterator`, which closes `pageSource`) and
`recordBuffer`. Then `fileGroupReader::close` is invoked again (its close
doesn't null out the fields) — closing the same
`baseFileIterator`/`recordBuffer` a second time, and `pageSource` again — and
finally `pageSource::close` runs a third time. Could you just register
`recordIterator::close` (or just `fileGroupReader::close`) since each one
already cascades to the others? @yihua does this need a closer look?
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiUtil.java:
##########
@@ -397,4 +410,53 @@ public static Schema
getLatestTableSchema(HoodieTableMetaClient metaClient, Stri
throw new TrinoException(HUDI_FILESYSTEM_ERROR, e);
}
}
+
+ public static List<HiveColumnHandle> getOrderingColumnHandles(Table table,
TypeManager typeManager, Lazy<HoodieTableMetaClient> lazyMetaClient,
HiveTimestampPrecision timestampPrecision)
+ {
+ RecordMergeMode recordMergeMode =
lazyMetaClient.get().getTableConfig().getRecordMergeMode();
+ if (recordMergeMode == null || recordMergeMode ==
RecordMergeMode.COMMIT_TIME_ORDERING) {
+ // if commit time ordering is enabled, return empty list
+ return Collections.emptyList();
+ }
+
+ ImmutableList.Builder<HiveColumnHandle> columns =
ImmutableList.builder();
+ List<String> orderingColumnNames =
lazyMetaClient.get().getTableConfig().getOrderingFields();
+
+ int hiveColumnIndex = 0;
+ for (Column field : table.getDataColumns()) {
+ // ignore unsupported types rather than failing
+ if (orderingColumnNames.contains(field.getName())) {
+ HiveType hiveType = field.getType();
+ if (typeSupported(hiveType.getTypeInfo(),
table.getStorage().getStorageFormat())) {
+ columns.add(createBaseColumn(field.getName(),
hiveColumnIndex, hiveType, getType(hiveType, typeManager, timestampPrecision),
REGULAR, field.getComment()));
+ }
+ }
+ hiveColumnIndex++;
+ }
+
+ return columns.build();
+ }
+
+ /**
+ * Converts the given {@link HoodiePairData} into a {@link Map}.
+ * <p>
+ * Special handling is applied for null keys:
+ * <ul>
+ * <li>If a key is null, it is stored in the map as a {@code null}
entry.</li>
+ * <li>If multiple entries share the same key (including null), the
latest value overwrites the previous one.</li>
+ * </ul>
+ *
+ * @param pairData the HoodiePairData containing key-value pairs
+ * @param <K> the type of keys maintained by the resulting map
+ * @param <V> the type of mapped values
+ * @return a {@link Map} containing all key-value pairs from the input data
+ */
+ public static <K, V> Map<K, V> collectAsMap(HoodiePairData<K, V> pairData)
+ {
+ // HashMap allows null keys, so collect directly; on duplicate keys
the later entry wins.
+ return pairData.collectAsList().stream()
Review Comment:
🤖 nit: the three-arg `Stream.collect(HashMap::new, accumulator, combiner)`
reads awkwardly for what is just a put-each-pair loop. A plain `HashMap<>` +
`forEach` would be more obvious:
```java
Map<K, V> result = new HashMap<>();
pairData.collectAsList().forEach(pair -> result.put(pair.getKey(),
pair.getValue()));
return result;
```
<sub><i>- AI-generated; verify before applying. React 👍/👎 to flag
quality.</i></sub>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]