Re: [PR] feat(trino): Migrate the Trino-Hudi connector into the Hudi repo (RFC-105) [hudi]

via GitHub Wed, 03 Jun 2026 00:25:50 -0700


hudi-agent commented on code in PR #18837:
URL: https://github.com/apache/hudi/pull/18837#discussion_r3346662065



##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiUtil.java:
##########
@@ -78,10 +92,14 @@
 import static io.trino.plugin.hudi.HudiErrorCode.HUDI_SCHEMA_ERROR;
 import static io.trino.plugin.hudi.HudiErrorCode.HUDI_UNSUPPORTED_FILE_FORMAT;
 import static java.lang.Math.toIntExact;
-import static org.apache.hudi.common.model.HoodieRecord.HOODIE_META_COLUMNS;
+import static 
org.apache.hudi.common.model.HoodieRecord.PARTITION_PATH_METADATA_FIELD;
+import static 
org.apache.hudi.common.model.HoodieRecord.RECORD_KEY_METADATA_FIELD;
 
 public final class HudiUtil

Review Comment:
   🤖 nit: this shadows `HoodieRecord.HOODIE_META_COLUMNS` (which contains 5 
fields including commit_time, commit_seqno, file_name) with a 2-field subset 
under the same name — a reader who has seen the upstream constant will assume 
the wrong contents. Could you rename to something like 
`HUDI_REQUIRED_META_COLUMNS` to make the distinction explicit?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiPageSource.java:
##########
@@ -115,8 +132,14 @@ public long getMemoryUsage()
     public void close()
             throws IOException
     {
-        fileGroupReader.close();
-        pageSource.close();
+        // Closer attempts every close and aggregates failures via 
addSuppressed, rethrowing the
+        // first. Registration is LIFO, so registering in reverse gives the 
original close order:
+        // recordIterator (wraps the readers) first, then fileGroupReader, 
then pageSource.
+        try (Closer closer = Closer.create()) {
+            closer.register(pageSource::close);
+            closer.register(fileGroupReader::close);
+            closer.register(recordIterator::close);

Review Comment:
   🤖 I think this `Closer` is invoking each underlying resource multiple times: 
`recordIterator.close()` is `CloseableMappingIterator` → 
`HoodieFileGroupReaderIterator.close()` → `fileGroupReader.close()` → closes 
`baseFileIterator` (the iterator from 
`HudiTrinoReaderContext.getFileRecordIterator`, which closes `pageSource`) and 
`recordBuffer`. Then `fileGroupReader::close` is invoked again (its close 
doesn't null out the fields) — closing the same 
`baseFileIterator`/`recordBuffer` a second time, and `pageSource` again — and 
finally `pageSource::close` runs a third time. Could you just register 
`recordIterator::close` (or just `fileGroupReader::close`) since each one 
already cascades to the others? @yihua does this need a closer look?
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/HudiUtil.java:
##########
@@ -397,4 +410,53 @@ public static Schema 
getLatestTableSchema(HoodieTableMetaClient metaClient, Stri
             throw new TrinoException(HUDI_FILESYSTEM_ERROR, e);
         }
     }
+
+    public static List<HiveColumnHandle> getOrderingColumnHandles(Table table, 
TypeManager typeManager, Lazy<HoodieTableMetaClient> lazyMetaClient, 
HiveTimestampPrecision timestampPrecision)
+    {
+        RecordMergeMode recordMergeMode = 
lazyMetaClient.get().getTableConfig().getRecordMergeMode();
+        if (recordMergeMode == null || recordMergeMode == 
RecordMergeMode.COMMIT_TIME_ORDERING) {
+            // if commit time ordering is enabled, return empty list
+            return Collections.emptyList();
+        }
+
+        ImmutableList.Builder<HiveColumnHandle> columns = 
ImmutableList.builder();
+        List<String> orderingColumnNames = 
lazyMetaClient.get().getTableConfig().getOrderingFields();
+
+        int hiveColumnIndex = 0;
+        for (Column field : table.getDataColumns()) {
+            // ignore unsupported types rather than failing
+            if (orderingColumnNames.contains(field.getName())) {
+                HiveType hiveType = field.getType();
+                if (typeSupported(hiveType.getTypeInfo(), 
table.getStorage().getStorageFormat())) {
+                    columns.add(createBaseColumn(field.getName(), 
hiveColumnIndex, hiveType, getType(hiveType, typeManager, timestampPrecision), 
REGULAR, field.getComment()));
+                }
+            }
+            hiveColumnIndex++;
+        }
+
+        return columns.build();
+    }
+
+    /**
+     * Converts the given {@link HoodiePairData} into a {@link Map}.
+     * <p>
+     * Special handling is applied for null keys:
+     * <ul>
+     *   <li>If a key is null, it is stored in the map as a {@code null} 
entry.</li>
+     *   <li>If multiple entries share the same key (including null), the 
latest value overwrites the previous one.</li>
+     * </ul>
+     *
+     * @param pairData the HoodiePairData containing key-value pairs
+     * @param <K> the type of keys maintained by the resulting map
+     * @param <V> the type of mapped values
+     * @return a {@link Map} containing all key-value pairs from the input data
+     */
+    public static <K, V> Map<K, V> collectAsMap(HoodiePairData<K, V> pairData)
+    {
+        // HashMap allows null keys, so collect directly; on duplicate keys 
the later entry wins.
+        return pairData.collectAsList().stream()

Review Comment:
   🤖 nit: the three-arg `Stream.collect(HashMap::new, accumulator, combiner)` 
reads awkwardly for what is just a put-each-pair loop. A plain `HashMap<>` + 
`forEach` would be more obvious:
   ```java
   Map<K, V> result = new HashMap<>();
   pairData.collectAsList().forEach(pair -> result.put(pair.getKey(), 
pair.getValue()));
   return result;
   ```
   
   <sub><i>- AI-generated; verify before applying. React 👍/👎 to flag 
quality.</i></sub>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat(trino): Migrate the Trino-Hudi connector into the Hudi repo (RFC-105) [hudi]

Reply via email to