hudi.git: Error while running github feature from master:.asf.yaml
An error occurred while processing the github feature in .asf.yaml: GitHub discussions can only be enabled if a mailing list target exists for it. --- With regards, ASF Infra.
Re: [I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]
leeseven1211 commented on issue #12989: URL: https://github.com/apache/hudi/issues/12989#issuecomment-2735508732 The code only matches the ConsistentBucketBulkInsertDataInternalWriterHelper, and for all cases that do not match, it uses the BulkInsertDataInternalWriterHelper. When using BucketIndexEngineType.SIMPL, why can't BucketBulkInsertDataInternalWriterHelper be used -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]
hudi-bot commented on PR #12993: URL: https://github.com/apache/hudi/pull/12993#issuecomment-2735251387 ## CI report: * ca52bb6677971593da5f246468ce260096c88d8a Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4256) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2735342081 ## CI report: * 411b6b0bd0238e770d9454c88d5a1daca0af41a6 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4258) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [RFC-92] Pluggable Table Format Support [hudi]
bvaradar opened a new pull request, #12998: URL: https://github.com/apache/hudi/pull/12998 ### Change Logs Pluggable Table Format Support in Hudi ### Impact Pluggable Table Format Support in Hudi ### Risk level (write none, low medium or high below) none ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9188] Fixing RLI record generation to account for deletes with lower ordering values in MOR log files [hudi]
yihua commented on code in PR #12984: URL: https://github.com/apache/hudi/pull/12984#discussion_r2002025587 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -901,6 +919,53 @@ public static HoodieData convertMetadataToRecordIndexRecords(Hoodi } } + static Set getValidRecordKeysForFileSlice(HoodieTableMetaClient metaClient, Review Comment: Does the file group reading now add additional latency compared to before? Should we consider optimizations for `EVENT_TIME_ORDERING` that can avoid such merging? Also is the behavior consistent with global index, i.e., once the record is deleted through a log file in MOR table, the record no longer belongs to the file group even though the record exists in the base file? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9188] Fixing RLI record generation to account for deletes with lower ordering values in MOR log files [hudi]
yihua commented on code in PR #12984: URL: https://github.com/apache/hudi/pull/12984#discussion_r2002015927 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala: ## @@ -541,6 +541,66 @@ class TestMORDataSource extends HoodieSparkClientTestBase with SparkDatasetMixin assertEquals(0, hudiSnapshotDF3.count()) // 100 records were deleted, 0 record to load } + @Test + def testDeletesWithLowerOrderingValue() : Unit = { Review Comment: Should this test be added to `TestRecordLevelIndex` since it's record index specific? ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/testutils/DataSourceTestUtils.java: ## @@ -130,13 +133,24 @@ public static List generateRandomRowsEvolvedSchema(int count) { } public static List updateRowsWithHigherTs(Dataset inputDf) { +return updateRowsWithUpdatedTs(inputDf, false, false); + } + + public static List updateRowsWithUpdatedTs(Dataset inputDf, Boolean lowerTs, Boolean updatePartitionPath) { List input = inputDf.collectAsList(); List rows = new ArrayList<>(); for (Row row : input) { - Object[] values = new Object[3]; + Object[] values = new Object[4]; Review Comment: The changes should already be merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]
hudi-bot commented on PR #12987: URL: https://github.com/apache/hudi/pull/12987#issuecomment-2734751442 ## CI report: * 8f98d0ff87fd8d21365696b22af77caac421cdd5 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4250) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]
yihua commented on code in PR #12993: URL: https://github.com/apache/hudi/pull/12993#discussion_r2001887764 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala: ## @@ -490,13 +491,19 @@ object ColumnStatsIndexSupport { case ShortType => value.asInstanceOf[Int].toShort case ByteType => value.asInstanceOf[Int].toByte - // TODO fix - case _: DecimalType => + case dt: DecimalType => value match { case buffer: ByteBuffer => -val logicalType = DecimalWrapper.SCHEMA$.getField("value").schema().getLogicalType -decConv.fromBytes(buffer, null, logicalType) - case _ => value +// Use the DecimalType's precision and scale (instead of using the schema from DecimalWrapper) Review Comment: My understanding is that this only affects reading the column stats from MDT, not writing, so there is no storage byte change. Correct? ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala: ## @@ -455,10 +456,10 @@ object ColumnStatsIndexSupport { case w: LongWrapper => w.getValue case w: FloatWrapper => w.getValue case w: DoubleWrapper => w.getValue + case w: DecimalWrapper => w.getValue // Moved above BytesWrapper to ensure proper matching Review Comment: Do we have functional tests covering the data skipping on a decimal column using column stats? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-9140] Fix log block io type and other rollback strategy fixes for table version 6 [hudi]
lokeshj1703 opened a new pull request, #12992: URL: https://github.com/apache/hudi/pull/12992 ### Change Logs The PR fixes the iotype as APPEND for log blocks in table version 6. It also reverts some changes made to MarkerBasedRollbackStrategy for table version 6 in HUDI-9030. ### Impact NA ### Risk level (write none, low medium or high below) low ### Documentation Update NA ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8178] Fix CI failures for partition-stats enablement [hudi]
nsivabalan commented on PR #12081: URL: https://github.com/apache/hudi/pull/12081#issuecomment-2734569017 We fixed both Date and LocalDate with col stats and partition stats. https://github.com/apache/hudi/blob/1f43b231763a978bef8d340a654e9f6287241ec9/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java#L152C55-L152C86 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9188] Fixing RLI record generation to account for deletes with lower ordering values in MOR log files [hudi]
yihua commented on code in PR #12984: URL: https://github.com/apache/hudi/pull/12984#discussion_r2001950535 ## hudi-common/src/main/java/org/apache/hudi/metadata/RecordIndexRecordKeyParsingUtils.java: ## @@ -33,20 +33,17 @@ import org.apache.hadoop.fs.Path; import java.util.ArrayList; -import java.util.Collection; import java.util.Collections; import java.util.HashMap; import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Map; import java.util.Set; -import java.util.function.Function; -import java.util.stream.Stream; import static java.util.stream.Collectors.toList; -public class BaseFileRecordParsingUtils { +public class RecordIndexRecordKeyParsingUtils { Review Comment: nit: rename to `RecordIndexUtils` ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestMetadataUtilRLIandSIRecordGeneration.java: ## @@ -281,9 +282,16 @@ public void testRecordGenerationAPIsForMOR() throws IOException { assertTrue(compactionInstantOpt.isPresent()); HoodieWriteMetadata compactionWriteMetadata = client.compact(compactionInstantOpt.get()); HoodieCommitMetadata compactionCommitMetadata = (HoodieCommitMetadata) compactionWriteMetadata.getCommitMetadata().get(); - // no RLI records should be generated for compaction operation. - assertTrue(convertMetadataToRecordIndexRecords(context, compactionCommitMetadata, writeConfig.getMetadataConfig(), - metaClient, writeConfig.getWritesFileIdEncoding(), compactionInstantOpt.get(), EngineType.SPARK).isEmpty()); + + HoodieBackedTableMetadata tableMetadata = new HoodieBackedTableMetadata(engineContext, metaClient.getStorage(), writeConfig.getMetadataConfig(), writeConfig.getBasePath(), true); + HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(tableMetadata, metaClient, metaClient.getActiveTimeline()); + try { Review Comment: try with resources for both `tableMetadata` and `fsView`? ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1087,20 +1088,27 @@ engineContext, dataWriteConfig, commitMetadata, instantTime, dataMetaClient, get getMetadataPartitionsToUpdate(), dataWriteConfig.getBloomFilterType(), dataWriteConfig.getBloomIndexParallelism(), dataWriteConfig.getWritesFileIdEncoding(), getEngineType(), Option.of(dataWriteConfig.getRecordMerger().getRecordType())); - - // Updates for record index are created by parsing the WriteStatus which is a hudi-client object. Hence, we cannot yet move this code - // to the HoodieTableMetadataUtil class in hudi-common. - if (getMetadataPartitionsToUpdate().contains(RECORD_INDEX.getPartitionPath())) { -HoodieData additionalUpdates = getRecordIndexAdditionalUpserts(partitionToRecordMap.get(RECORD_INDEX.getPartitionPath()), commitMetadata); -partitionToRecordMap.put(RECORD_INDEX.getPartitionPath(), partitionToRecordMap.get(RECORD_INDEX.getPartitionPath()).union(additionalUpdates)); - } + updateRecordIndexRecordsIfPresent(commitMetadata, instantTime, partitionToRecordMap); updateExpressionIndexIfPresent(commitMetadata, instantTime, partitionToRecordMap); updateSecondaryIndexIfPresent(commitMetadata, partitionToRecordMap, instantTime); return partitionToRecordMap; }); closeInternal(); } + private void updateRecordIndexRecordsIfPresent(HoodieCommitMetadata commitMetadata, String instantTime, Map> partitionToRecordMap) { +if (!RECORD_INDEX.isMetadataPartitionAvailable(dataMetaClient)) { Review Comment: Should this still follow the same check as before: `getMetadataPartitionsToUpdate().contains(RECORD_INDEX.getPartitionPath())`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-9086) Master is broken Feb 27, 2025
[ https://issues.apache.org/jira/browse/HUDI-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-9086. - Resolution: Fixed > Master is broken Feb 27, 2025 > - > > Key: HUDI-9086 > URL: https://issues.apache.org/jira/browse/HUDI-9086 > Project: Apache Hudi > Issue Type: Sub-task > Components: dev-experience >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 1.0.2 > > Original Estimate: 4h > Time Spent: 4h > Remaining Estimate: 0h > > Master is broken as of now. > {code:java} > 2025-02-28T00:34:18.4012123Z [ERROR] Tests run: 1, Failures: 0, Errors: 1, > Skipped: 0, Time elapsed: 0.293 s <<< FAILURE! - in > org.apache.hudi.TestDataSourceUtils > 2025-02-28T00:34:18.4012821Z [ERROR] > testDeduplicationAgainstRecordsAlreadyInTable Time elapsed: 0.282 s <<< > ERROR! > 2025-02-28T00:34:18.4013202Z org.apache.spark.SparkException: > 2025-02-28T00:34:18.4041057Z Only one SparkContext should be running in this > JVM (see SPARK-2243).The currently running SparkContext was created at: > 2025-02-28T00:34:18.4077789Z > org.apache.spark.sql.hive.TestHiveClientUtils.setUp(TestHiveClientUtils.scala:43) > 2025-02-28T00:34:18.4078473Z > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2025-02-28T00:34:18.4081698Z > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > 2025-02-28T00:34:18.4082148Z > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2025-02-28T00:34:18.4082478Z java.lang.reflect.Method.invoke(Method.java:498) > 2025-02-28T00:34:18.4087413Z > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725) > 2025-02-28T00:34:18.4088031Z > org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) > 2025-02-28T00:34:18.4089261Z > org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) > 2025-02-28T00:34:18.4089580Z > org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149) > 2025-02-28T00:34:18.4089917Z > org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126) > 2025-02-28T00:34:18.4090239Z > org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68) > 2025-02-28T00:34:18.4090561Z > org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115) > 2025-02-28T00:34:18.4090892Z > org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105) > 2025-02-28T00:34:18.4091376Z > org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) > 2025-02-28T00:34:18.4091705Z > org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) > 2025-02-28T00:34:18.4092021Z > org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) > 2025-02-28T00:34:18.4092329Z > org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) > 2025-02-28T00:34:18.4092617Z > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104) > 2025-02-28T00:34:18.4092898Z > org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98) > 2025-02-28T00:34:18.4093216Z > org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllMethods$11(ClassBasedTestDescriptor.java:397) > 2025-02-28T00:34:18.4093542Z at > org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2840) > 2025-02-28T00:34:18.4093794Z at scala.Option.foreach(Option.scala:407) > 2025-02-28T00:34:18.4094033Z at > org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2837) > 2025-02-28T00:34:18.4094305Z at > org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2927) > 2025-02-28T00:34:18.4094559Z at > org.apache.spark.SparkContext.(SparkContext.scala:99) > 2025-02-28T00:34:18.4094836Z at > org.apache.hudi.testutils.HoodieSparkClientTestHarness.initSparkContexts(HoodieSparkClientTestHarness.java:203) > 2025-02-28T00:34:18.4095249Z at > org.apache.hudi.testutils.HoodieSparkClientTestHarness.initSparkContexts(HoodieSparkClientTestHarness.java:229) > 2025-02-28T00:34:18.4095557Z at > org.apache.hudi.testutils.HoodieSparkClientTestHarness.initResources(HoodieSparkClientTestHarness.java:159) > 2025-02-28T00:34:18.4095851Z at > org.apache.hudi.testutil
[jira] [Closed] (HUDI-9127) Fix completion time generation to honor the time zone set in table config
[ https://issues.apache.org/jira/browse/HUDI-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-9127. - Resolution: Fixed [https://github.com/apache/hudi/commit/9baaed9409a1a5b654e88d50ac6826a96d6169bb] > Fix completion time generation to honor the time zone set in table config > - > > Key: HUDI-9127 > URL: https://issues.apache.org/jira/browse/HUDI-9127 > Project: Apache Hudi > Issue Type: Sub-task > Components: writer-core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.2 > > Original Estimate: 3h > Remaining Estimate: 3h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-9119) Hudi 1.0.1 cannot write MOR tables
[ https://issues.apache.org/jira/browse/HUDI-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-9119: -- Parent: HUDI-8724 Issue Type: Sub-task (was: Bug) > Hudi 1.0.1 cannot write MOR tables > -- > > Key: HUDI-9119 > URL: https://issues.apache.org/jira/browse/HUDI-9119 > Project: Apache Hudi > Issue Type: Sub-task >Affects Versions: 1.0.1 >Reporter: Shawn Chang >Priority: Critical > Fix For: 1.0.2 > > > When testing Hudi 1.0.1 on EMR 7.8, I can see issues like below: > {code:java} > Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed > to update metadata at > org.apache.hudi.client.BaseHoodieClient.writeTableMetadata(BaseHoodieClient.java:282) > at > org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:293) > at > org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:253) > at > org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:94) > at > org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:999) > at > org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:538) > at > org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$write$1(HoodieSparkSqlWriter.scala:193) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) > at > org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384) > at > org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:157) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$10(SQLExecution.scala:220) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) > at > org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:220) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:405) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:219) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) > at > org.apache.spark.sql.adapter.BaseSpark3Adapter.sqlExecutionWithNewExecutionId(BaseSpark3Adapter.scala:105) > at > org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:215) > at > org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:130) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:185) at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:126) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) > at > org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384) > at > org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:157) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$10(SQLExecution.scala:220) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108) > at > org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:220) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:405) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:219) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901) at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:123) > at > org.apache.spar
Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]
hudi-bot commented on PR #12987: URL: https://github.com/apache/hudi/pull/12987#issuecomment-2734875401 ## CI report: * 8f98d0ff87fd8d21365696b22af77caac421cdd5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]
hudi-bot commented on PR #12987: URL: https://github.com/apache/hudi/pull/12987#issuecomment-2734877234 ## CI report: * 8f98d0ff87fd8d21365696b22af77caac421cdd5 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4250) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-8655) Create Tests for Filegroup reader for Schema Cache and for Spillable Map
[ https://issues.apache.org/jira/browse/HUDI-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan closed HUDI-8655. - Resolution: Fixed > Create Tests for Filegroup reader for Schema Cache and for Spillable Map > > > Key: HUDI-8655 > URL: https://issues.apache.org/jira/browse/HUDI-8655 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.2 > > Original Estimate: 2h > Remaining Estimate: 2h > > We need unit tests for schema cache > For spillable map, we need to add test cases for how the fg reader will use > it and ensure we test spilling to disk in the test -- This message was sent by Atlassian Jira (v8.20.10#820010)
[I] [SUPPORT] When writing to a Hudi MOR table using Flink, data merging did not occur based on the expected value of "precombine.field". [hudi]
Toroidals opened a new issue, #12996: URL: https://github.com/apache/hudi/issues/12996 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? y - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** Primary Key: id precombine.field: ts_ms (ts_ms is a 13-digit timestamp in milliseconds) Scenario 1: The Hudi MOR table contains a record: id=1, version=1, ts_ms=1741022687053. A new record is submitted: id=1, version=2, ts_ms=1741022687053 (same ts_ms as the existing record). The merge behaves as expected, and the final result is: id=1, version=2, ts_ms=1741022687053 Scenario 2: The Hudi MOR table contains a record: id=1, version=1, ts_ms=1741022687053. Two new records are submitted: id=1, version=2, ts_ms=1741022687054 (ts_ms is 1 millisecond greater than the existing record). id=1, version=3, ts_ms=1741022687054 (same ts_ms as the first new record). Expected merge result: id=1, version=3, ts_ms=1741022687054 (latest version with the same ts_ms should be retained). However, sometimes the result is: id=1, version=2, ts_ms=1741022687054, which is not expected. Issue: When multiple records with the same primary key (id) and the same ts_ms are submitted in a batch, the merge process does not strictly follow the arrival order of the messages. Instead, it appears to randomly pick one of the records from the batch. flink conf: HoodiePipeline.Builder builder = HoodiePipeline.builder(infoMap.get("hudi_table_name")); Map options = new HashMap<>(); options.put(FlinkOptions.DATABASE_NAME.key(), infoMap.get("hudi_database_name")); options.put(FlinkOptions.TABLE_NAME.key(), infoMap.get("hudi_table_name")); options.put(FlinkOptions.PATH.key(), infoMap.get("hudi_hdfs_path")); options.put("catalog.path", "hdfs:///apps/hudi/catalog/"); String hudiFieldMap = infoMap.get("hudi_field_map").toLowerCase(Locale.ROOT); ArrayList> fieldList = JSON.parseObject(hudiFieldMap, new TypeReference>>() { }); log.info("fieldList: {}", fieldList.toString()); for (ArrayList columnList : fieldList) { builder.column("`" + columnList.get(0) + "` " + columnList.get(1)); } String[] hudiPrimaryKeys = infoMap.get("hudi_primary_key").split(","); builder.pk(hudiPrimaryKeys); options.put(FlinkOptions.PRECOMBINE_FIELD.key(), "ts_ms"); **options.put(FlinkOptions.PAYLOAD_CLASS_NAME.key(), EventTimeAvroPayload.class.getName()); options.put(FlinkOptions.RECORD_MERGER_IMPLS.key(), HoodieAvroRecordMerger.class.getName());** options.put(FlinkOptions.TABLE_TYPE.key(), HoodieTableType.MERGE_ON_READ.name()); options.put(FlinkOptions.INDEX_TYPE.key(), HoodieIndex.IndexType.BUCKET.name()); options.put(FlinkOptions.BUCKET_INDEX_NUM_BUCKETS.key(), infoMap.get("hudi_bucket_index_num_buckets")); options.put(FlinkOptions.BUCKET_INDEX_ENGINE_TYPE.key(), infoMap.get("hudi_bucket_index_engine_type")); options.put(FlinkOptions.COMPACTION_TRIGGER_STRATEGY.key(), infoMap.get("hudi_compaction_trigger_strategy")); options.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), infoMap.get("hudi_compaction_delta_commits")); options.put(FlinkOptions.COMPACTION_DELTA_SECONDS.key(), infoMap.get("hudi_compaction_delta_seconds")); options.put(FlinkOptions.COMPACTION_MAX_MEMORY.key(), infoMap.get("hudi_compaction_max_memory")); options.put(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), "true"); options.put(FlinkOptions.CLEAN_RETAIN_COMMITS.key(), "150"); options.put(FlinkOptions.HIVE_SYNC_ENABLED.key(), "true"); options.put(FlinkOptions.HIVE_SYNC_MODE.key(), "hms"); options.put(FlinkOptions.HIVE_SYNC_DB.key(), "hudi"); options.put(FlinkOptions.HIVE_SYNC_TABLE.key(), "mor_test_01"); options.put(FlinkOptions.HIVE_SYNC_CONF_DIR.key(), "/etc/hive/conf"); options.put(FlinkOptions.HIVE_SYNC_METASTORE_URIS.key(), "thrift://xx01:9083,thrift://xx02:9083,thrift://xx03:9083"); options.put(FlinkOptions.HIVE_SYNC_JDBC_URL.key(), "jdbc:hive2://xx01:21181,xx02:21181,xx03:21181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2"); options.put(FlinkOptions.HIVE_SYNC_SUPPORT_TIMESTAMP.key(), "true"); options.put(FlinkOptions.HIVE_SYNC_SKIP_RO_SUFFIX.key(), "true"); options.put(FlinkOptions.PARTITION_PATH_FIELD.key(), "part_dt"); options.put(FlinkOptions.HIVE_SYNC_PARTITION_FIELDS.key(), "part_dt"); options.put(FlinkOptions.WRITE_RATE_LIMIT.key(), "2"); options.put(FlinkOptions.WRITE_TASKS.key(), 8); options.put(FlinkOptions.OPERATION.key(), WriteOperationType.UPSERT.value()); builder.options(options); return builder; **To Reproduce**
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
lokeshj1703 commented on PR #12935: URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733818315 @linliu-code @yihua @nsivabalan The PR now cherry-picks Lin's fix and removes all the older fixes which were added earlier. It also reverts the changes made in HUDI-9030 for removing FGR. PR needs Lin's fix otherwise the tests would fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
hudi-bot commented on PR #12935: URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733830603 ## CI report: * d5f94e7afb5865449c7796b03cc4b9c786061ec2 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4152) * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4246) * 3498922c1b7993f4919f9bb4400fc8a8565ccdac UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
hudi-bot commented on PR #12935: URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733835632 ## CI report: * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4246) * 3498922c1b7993f4919f9bb4400fc8a8565ccdac UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8581] Test schema handler in fg reader and some refactoring to prevent bugs in the future [hudi]
nsivabalan commented on code in PR #12340: URL: https://github.com/apache/hudi/pull/12340#discussion_r2001378215 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -165,7 +165,7 @@ class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea HoodieAvroUtils.removeFields(skeletonRequiredSchema, rowIndexColumn)) //If we need to do position based merging with log files we will leave the row index column at the end - val dataProjection = if (getHasLogFiles && getShouldMergeUseRecordPosition) { + val dataProjection = if (getShouldMergeUseRecordPosition) { Review Comment: why removed the log files check? ## hudi-common/src/test/java/org/apache/hudi/common/table/read/TestSchemaHandler.java: ## @@ -0,0 +1,464 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.common.table.read; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.common.config.RecordMergeMode; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.engine.HoodieReaderContext; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordMerger; +import org.apache.hudi.common.table.HoodieTableConfig; +import org.apache.hudi.common.testutils.HoodieTestDataGenerator; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.collection.ClosableIterator; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.storage.HoodieStorage; +import org.apache.hudi.storage.StoragePath; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.Arguments; +import org.junit.jupiter.params.provider.MethodSource; + +import java.io.IOException; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.Map; +import java.util.function.UnaryOperator; +import java.util.stream.Stream; + +import static org.apache.hudi.common.config.RecordMergeMode.COMMIT_TIME_ORDERING; +import static org.apache.hudi.common.config.RecordMergeMode.CUSTOM; +import static org.apache.hudi.common.config.RecordMergeMode.EVENT_TIME_ORDERING; +import static org.apache.hudi.common.table.read.HoodiePositionBasedSchemaHandler.addPositionalMergeCol; +import static org.apache.hudi.common.table.read.HoodiePositionBasedSchemaHandler.getPositionalMergeField; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertTrue; +import static org.mockito.Mockito.mock; +import static org.mockito.Mockito.when; + +public class TestSchemaHandler { + + protected static final Schema DATA_SCHEMA = HoodieAvroUtils.addMetadataFields(HoodieTestDataGenerator.AVRO_SCHEMA); + protected static final Schema DATA_COLS_ONLY_SCHEMA = generateProjectionSchema("begin_lat", "tip_history", "rider"); + protected static final Schema META_COLS_ONLY_SCHEMA = generateProjectionSchema("_hoodie_commit_seqno", "_hoodie_record_key"); + + @Test + public void testCow() { +HoodieReaderContext readerContext = new MockReaderContext(false); +readerContext.setHasLogFiles(false); +readerContext.setHasBootstrapBaseFile(false); +readerContext.setShouldMergeUseRecordPosition(false); +HoodieTableConfig hoodieTableConfig = mock(HoodieTableConfig.class); +Schema requestedSchema = DATA_SCHEMA; +HoodieFileGroupReaderSchemaHandler schemaHandler = new HoodieFileGroupReaderSchemaHandler(readerContext, DATA_SCHEMA, +requestedSchema, Option.empty(), hoodieTableConfig, new TypedProperties()); +assertEquals(requestedSchema, schemaHandler.getRequiredSchema()); + +//read subset of columns +requestedSchema = generateProjectionSchema("begin_lat", "tip_history", "rider"); +schemaHandler = +new HoodieFileGroupReaderSchemaHandler(readerContext, DATA_SCHEMA, requestedSchema, +Option.empty(), hoodieTableConfig, new TypedP
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732226321 ## CI report: * d85952ac252b3a9cc7677188dac340ece0efcc1d Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4242) * 87512b51170f102d612c11b44bea7534a684c51d Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4243) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9013] Add backwards compatible MDT writer support and reader support with tbl v6 [hudi]
lokeshj1703 commented on PR #12948: URL: https://github.com/apache/hudi/pull/12948#issuecomment-2732856286 Azure CI passed: https://github.com/user-attachments/assets/05ba161a-8999-42a8-b125-f9e2a5a9cef6"; /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-8969) Analyze how to write `RowData` directly
[ https://issues.apache.org/jira/browse/HUDI-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov closed HUDI-8969. --- Resolution: Fixed This task will be done under RFC-87. > Analyze how to write `RowData` directly > --- > > Key: HUDI-8969 > URL: https://issues.apache.org/jira/browse/HUDI-8969 > Project: Apache Hudi > Issue Type: Task >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
lokeshj1703 commented on code in PR #12935: URL: https://github.com/apache/hudi/pull/12935#discussion_r2001484794 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/SparkBroadcastManager.java: ## @@ -71,6 +73,7 @@ public class SparkBroadcastManager extends EngineBroadcastManager { public SparkBroadcastManager(HoodieEngineContext context, HoodieTableMetaClient metaClient) { this.context = context; this.metaClient = metaClient; +this.tableVersion = metaClient.getTableConfig().getTableVersion(); Review Comment: This has been removed now after Lin's fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9144] Flink writer for MOR table supports writing RowData … [hudi]
hudi-bot commented on PR #12967: URL: https://github.com/apache/hudi/pull/12967#issuecomment-2733608704 ## CI report: * 21ba47d8c7b86b1c92a311e216c4b85dc17ed046 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4245) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
hudi-bot commented on PR #12935: URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733817403 ## CI report: * d5f94e7afb5865449c7796b03cc4b9c786061ec2 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4152) * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4246) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
hudi-bot commented on PR #12935: URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733813222 ## CI report: * d5f94e7afb5865449c7796b03cc4b9c786061ec2 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4152) * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-9140) Follow up from 9030
[ https://issues.apache.org/jira/browse/HUDI-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-9140: - Labels: pull-request-available (was: ) > Follow up from 9030 > --- > > Key: HUDI-9140 > URL: https://issues.apache.org/jira/browse/HUDI-9140 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: sivabalan narayanan >Assignee: Lokesh Jain >Priority: Major > Labels: pull-request-available > > Filing a tracking ticket for all follow ups from HUDI-9030 > > [https://github.com/apache/hudi/pull/12888/files#r1985816074] > [https://github.com/apache/hudi/pull/12888/files#r1985858219] > [https://github.com/apache/hudi/pull/12888/files#r1985859138] > Any changes to LogRecordScanner classes. not really required to be fixed > right away. > 2. Listing based rollback strategy. > 3. Add tests to restore to a commit w/ long history w/ a mix of DC, > compaction, clustering N no of times. all inline table service would do. But > restore should succeed and data validation should remain intact. > 4. Fix the IOType as Append for log files with table version 6 > 5. We will need to check if configs addded in version 1.0 are required for > tbl version 6 > 6. Ensure these are accounted for in FGR > [https://github.com/apache/hudi/pull/12888/files#r1976856670] > > > WIP patch: > [https://github.com/nsivabalan/hudi/tree/fixTableVersion6FixesAbstraction] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-9140] Fix log block io type and other rollback strategy fixes for table version 6 [hudi]
hudi-bot commented on PR #12992: URL: https://github.com/apache/hudi/pull/12992#issuecomment-2733101341 ## CI report: * 99916b679c4915e811845ae6261e9b3263c0feea UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9140] Fix log block io type and other rollback strategy fixes for table version 6 [hudi]
hudi-bot commented on PR #12992: URL: https://github.com/apache/hudi/pull/12992#issuecomment-2733104919 ## CI report: * 99916b679c4915e811845ae6261e9b3263c0feea Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4244) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]
maheshguptags opened a new issue, #12988: URL: https://github.com/apache/hudi/issues/12988 I am experiencing an issue when trying to delete records from a Hudi table where data is ingested using Flink streaming, and deletion is attempted using a Hudi batch processing job. Despite specifying a partition condition in the DELETE query, Hudi scans all partitions, which causes high resource usage and timeouts. **To Reproduce** Steps to reproduce the behavior: 1. Continuously ingest the data using hudi-flink streaming job. 2. Create hudi table with below config(another batch stream to delete the data from same table) ``` CREATE TABLE IF NOT EXISTS hudi_temp(x STRING,_date STRING,_count BIGINT,type STRING,update_date TIMESTAMP(3)) PARTITIONED BY (`x`) WITH ('connector' = 'hudi', 'hoodie.datasource.write.recordkey.field'='x,_date','path' = '${bucket_path_daily}','table.type' = 'COPY_ON_WRITE','hoodie.datasource.write.precombine.field'='updated_date','write.operation' = 'delete','hoodie.datasource.write.partitionpath.field'='x','hoodie.write.concurrency.mode'='optimistic_concurrency_control','hoodie.write.lock.provider'='org.apache.hudi.client.transaction.lock.InProcessLockProvider','hoodie.cleaner.policy.failed.writes'='LAZY')"); EnvironmentSettings settings = EnvironmentSettings.newInstance().inBatchMode().build(); TableEnvironment tEnv = TableEnvironment.create(settings); tEnv.executeSql(createDeleteTableDDL); tEnv.executeSql("DELETE FROM daily_activity_summary where x ='cl-278'").await(); tEnv.executeSql("SELECT * FROM Orders where x='cl-278'").print(); ``` 3. Deploy the Delete jobs 4. Followed below documents for doing same https://github.com/apache/flink/blob/release-1.20/docs/content/docs/dev/table/sql/delete.md **Expected behavior** Hudi should only scan the relevant partition (x = 'cl-278') when performing a DELETE operation, thereby reducing resource usage and preventing timeouts. it should delete the specific partition or specific conditions that are mentioned in step 1. **Environment Description** * Hudi version : 0.15.0 * Spark version : NO * Flink version : 1.18.1 * ENV : k8s * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : k8s **Additional context** I found there are multiple issue like. 1. why is it scanning all the partition even if I am giving partition details and condition in query it self. 2. I see there is pruner is getting called but still scanning all the data ``` 2025-03-03 10:47:33,191 INFO org.apache.hudi.util.StreamerUtil [] - Table option [hoodie.datasource.write.keygenerator.class] is reset to org.apache.hudi.keygen.ComplexAvroKeyGenerator because record key or partition path has two or more fields 2025-03-03 10:47:36,293 INFO org.apache.hudi.table.HoodieTableSource [] - Partition pruner for hoodie source, condition is: equals(client_id, 'cl-278') ``` **Stacktrace** For single record in cl-278 it is taking 10 min and still it is not deleting and getting below exception ``` Caused by: org.apache.hudi.exception.HoodieException: Timeout(601000ms) while waiting for instant initialize at org.apache.hudi.sink.utils.TimeWait.waitFor(TimeWait.java:57) at org.apache.hudi.sink.common.AbstractStreamWriteFunction.instantToWrite(AbstractStreamWriteFunction.java:269) at org.apache.hudi.sink.StreamWriteFunction.flushRemaining(StreamWriteFunction.java:452) at org.apache.hudi.sink.StreamWriteFunction.endInput(StreamWriteFunction.java:157) at org.apache.hudi.sink.common.AbstractWriteOperator.endInput(AbstractWriteOperator.java:48) at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.endOperatorInput(StreamOperatorWrapper.java:96) at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.endInput(RegularOperatorChain.java:97) at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68) at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562) at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231) at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807) at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953) at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) at java.base/java.lang.Thread.run(Unknown Source) ``` Fail
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
lokeshj1703 commented on code in PR #12935: URL: https://github.com/apache/hudi/pull/12935#discussion_r2001501987 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -112,6 +112,7 @@ public HoodieFileGroupReader(HoodieReaderContext readerContext, mergeStrategyId, null, tableConfig.getTableVersion()); recordMergeMode = triple.getLeft(); mergeStrategyId = triple.getRight(); + tableConfig.setValue(HoodieTableConfig.RECORD_MERGE_MODE.key(), recordMergeMode.name()); Review Comment: This change has been removed after Lin's fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7037) Column Stats for Decimal Field From Metadata table is read as Bytes
[ https://issues.apache.org/jira/browse/HUDI-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7037: -- Priority: Blocker (was: Critical) > Column Stats for Decimal Field From Metadata table is read as Bytes > --- > > Key: HUDI-7037 > URL: https://issues.apache.org/jira/browse/HUDI-7037 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Affects Versions: 0.14.1 >Reporter: Vamshi Gudavarthi >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 1.0.2 > > > During Onetable project, found that for Decimal field column stats read from > metadata table is read as BytesWrapper instead of DecimalWrapper essentially > the actual type got lost. Verified write side is fine (i.e. writing as > DecimalWrapper) but read side is where the problem is. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7037) Column Stats for Decimal Field From Metadata table is read as Bytes
[ https://issues.apache.org/jira/browse/HUDI-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7037: -- Status: In Progress (was: Open) > Column Stats for Decimal Field From Metadata table is read as Bytes > --- > > Key: HUDI-7037 > URL: https://issues.apache.org/jira/browse/HUDI-7037 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Affects Versions: 0.14.1 >Reporter: Vamshi Gudavarthi >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 1.0.2 > > > During Onetable project, found that for Decimal field column stats read from > metadata table is read as BytesWrapper instead of DecimalWrapper essentially > the actual type got lost. Verified write side is fine (i.e. writing as > DecimalWrapper) but read side is where the problem is. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]
hudi-bot commented on PR #12993: URL: https://github.com/apache/hudi/pull/12993#issuecomment-2734076458 ## CI report: * c81a532851ea54fcb58262145d016323a1e42ac7 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4248) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9144] Flink writer for MOR table supports writing RowData … [hudi]
cshuo commented on code in PR #12967: URL: https://github.com/apache/hudi/pull/12967#discussion_r2000890050 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/v2/HandleRecords.java: ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io.v2; + +import org.apache.hudi.common.model.DeleteRecord; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.util.Option; + +import java.util.Collections; +import java.util.Iterator; + +/** + * {@code HandleRecords} is a holder containing records iterator for {@code HoodieDataBlock} + * and delete records iterator for {@code HoodieDeleteBlock}. + * + * Insert records and delete records are separated using two iterators for more efficient + * memory utilization, for example, the data bytes in the iterator are reused based Flink managed + * memory pool, and the RowData wrapper is also a singleton reusable object to minimize on-heap + * memory costs, thus being more GC friendly for massive data scenarios. + */ +public class HandleRecords { + private final Iterator recordItr; + private final Option> deleteRecordItr; + + public HandleRecords(Iterator recordItr, Iterator deleteItr) { +this.recordItr = recordItr; +this.deleteRecordItr = Option.ofNullable(deleteItr); + } + + public Iterator getRecordItr() { +return this.recordItr; + } + + public Iterator getDeleteRecordItr() { +return this.deleteRecordItr.orElse(Collections.emptyIterator()); + } + + public static Builder builder() { +return new Builder(); + } + + public static class Builder { +private Iterator recordItr; +private Iterator deleteRecordItr; + +public Builder() { +} + +public Builder withRecordItr(Iterator recordItr) { + this.recordItr = recordItr; + return this; +} + +public Builder withDeleteRecordItr(Iterator deleteRecordItr) { + this.deleteRecordItr = deleteRecordItr; + return this; +} + +public HandleRecords build() { + return new HandleRecords(recordItr, deleteRecordItr); +} + } Review Comment: `HandleRecords` will be removed after discussing with Danny, see detail [here](https://github.com/apache/hudi/pull/12967#discussion_r2000357105). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7930] Flink Support for Array of Row and Map of Row value [hudi]
David-N-Perkins commented on PR #11727: URL: https://github.com/apache/hudi/pull/11727#issuecomment-2732958024 @empcl If I remember correctly, it was needed to get consistent names and structure in the Parquet files. I was seeing differences depending on whether the operation was "insert", "upsert", or "bulk_insert". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
lokeshj1703 commented on code in PR #12935: URL: https://github.com/apache/hudi/pull/12935#discussion_r2001485609 ## hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala: ## @@ -59,17 +60,16 @@ import scala.collection.mutable * @param filters spark filters that might be pushed down into the reader * @param requiredFilters filters that are required and should always be used, even in merging situations */ -class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetReader, - filters: Seq[Filter], - requiredFilters: Seq[Filter]) extends BaseSparkInternalRowReaderContext { +class SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetReader, filters: Seq[Filter], + requiredFilters: Seq[Filter], tableVersion: HoodieTableVersion) extends BaseSparkInternalRowReaderContext { lazy val sparkAdapter: SparkAdapter = SparkAdapterSupport.sparkAdapter private lazy val bootstrapSafeFilters: Seq[Filter] = filters.filter(filterIsSafeForBootstrap) ++ requiredFilters private val deserializerMap: mutable.Map[Schema, HoodieAvroDeserializer] = mutable.Map() private val serializerMap: mutable.Map[Schema, HoodieAvroSerializer] = mutable.Map() private lazy val allFilters = filters ++ requiredFilters override def supportsParquetRowIndex: Boolean = { -HoodieSparkUtils.gteqSpark3_5 +HoodieSparkUtils.gteqSpark3_5 && tableVersion.greaterThanOrEquals(HoodieTableVersion.EIGHT) Review Comment: This change has been removed after Lin's fix -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-9043) Analyze possibility to optimize `FlinkWriteHelper::deduplicateRecords`
[ https://issues.apache.org/jira/browse/HUDI-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936425#comment-17936425 ] Geser Dugarov commented on HUDI-9043: - {color:#00}`RowDataStreamWriteFunction{color}::deduplicateRecordsIfNeeded` should be completed first now, and then we could check costs. > Analyze possibility to optimize `FlinkWriteHelper::deduplicateRecords` > -- > > Key: HUDI-9043 > URL: https://issues.apache.org/jira/browse/HUDI-9043 > Project: Apache Hudi > Issue Type: Task >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Major > > `FlinkWriteHelper::deduplicateRecords` looks like too costly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-9043) Analyze possibility to optimize `FlinkWriteHelper::deduplicateRecords`
[ https://issues.apache.org/jira/browse/HUDI-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-9043: Status: Open (was: In Progress) > Analyze possibility to optimize `FlinkWriteHelper::deduplicateRecords` > -- > > Key: HUDI-9043 > URL: https://issues.apache.org/jira/browse/HUDI-9043 > Project: Apache Hudi > Issue Type: Task >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Major > > `FlinkWriteHelper::deduplicateRecords` looks like too costly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732680960 ## CI report: * 87512b51170f102d612c11b44bea7534a684c51d Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4243) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8796) Silent ignoring of bucket index in Flink append mode
[ https://issues.apache.org/jira/browse/HUDI-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-8796: Status: Open (was: In Progress) > Silent ignoring of bucket index in Flink append mode > > > Key: HUDI-8796 > URL: https://issues.apache.org/jira/browse/HUDI-8796 > Project: Apache Hudi > Issue Type: Bug >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Minor > Labels: pull-request-available > Fix For: 1.1.0 > > > Currently, there is no exception when we try to write data in Flink append > mode using bucket index. Data will be written, but in parquet files without > bucket IDs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-9144] Flink writer for MOR table supports writing RowData … [hudi]
Alowator commented on code in PR #12967: URL: https://github.com/apache/hudi/pull/12967#discussion_r2000181537 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/RowDataStreamWriteFunction.java: ## @@ -0,0 +1,563 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.sink; + +import org.apache.hudi.client.FlinkTaskContextSupplier; +import org.apache.hudi.client.WriteStatus; +import org.apache.hudi.client.model.HoodieFlinkInternalRow; +import org.apache.hudi.client.model.HoodieFlinkRecord; +import org.apache.hudi.common.model.DeleteRecord; +import org.apache.hudi.common.model.HoodieKey; +import org.apache.hudi.common.model.HoodieOperation; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordMerger; +import org.apache.hudi.common.model.WriteOperationType; +import org.apache.hudi.common.util.HoodieRecordUtils; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.ValidationUtils; +import org.apache.hudi.common.util.VisibleForTesting; +import org.apache.hudi.common.util.collection.MappingIterator; +import org.apache.hudi.configuration.FlinkOptions; +import org.apache.hudi.configuration.OptionsResolver; +import org.apache.hudi.exception.HoodieException; +import org.apache.hudi.io.v2.HandleRecords; +import org.apache.hudi.metrics.FlinkStreamWriteMetrics; +import org.apache.hudi.sink.buffer.MemorySegmentPoolFactory; +import org.apache.hudi.sink.buffer.RowDataBucket; +import org.apache.hudi.sink.buffer.TotalSizeTracer; +import org.apache.hudi.sink.bulk.RowDataKeyGen; +import org.apache.hudi.sink.common.AbstractStreamWriteFunction; +import org.apache.hudi.sink.event.WriteMetadataEvent; +import org.apache.hudi.sink.exception.MemoryPagesExhaustedException; +import org.apache.hudi.sink.utils.BufferUtils; +import org.apache.hudi.table.action.commit.BucketInfo; +import org.apache.hudi.table.action.commit.BucketType; +import org.apache.hudi.util.MutableIteratorWrapperIterator; +import org.apache.hudi.util.PreCombineFieldExtractor; +import org.apache.hudi.util.StreamerUtil; + +import org.apache.flink.configuration.Configuration; +import org.apache.flink.metrics.MetricGroup; +import org.apache.flink.streaming.api.functions.ProcessFunction; +import org.apache.flink.table.data.GenericRowData; +import org.apache.flink.table.data.RowData; +import org.apache.flink.table.data.StringData; +import org.apache.flink.table.data.TimestampData; +import org.apache.flink.table.data.binary.BinaryRowData; +import org.apache.flink.table.data.utils.JoinedRowData; +import org.apache.flink.table.runtime.operators.sort.BinaryInMemorySortBuffer; +import org.apache.flink.table.runtime.util.MemorySegmentPool; +import org.apache.flink.table.types.logical.LogicalType; +import org.apache.flink.table.types.logical.RowType; +import org.apache.flink.types.RowKind; +import org.apache.flink.util.Collector; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Collections; +import java.util.Comparator; +import java.util.HashMap; +import java.util.Iterator; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.NoSuchElementException; +import java.util.concurrent.atomic.AtomicLong; + +/** + * Sink function to write the data to the underneath filesystem. + * + * Work Flow + * + * The function firstly buffers the data (RowData) in a binary buffer based on {@code BinaryInMemorySortBuffer}. + * It flushes(write) the records batch when the batch size exceeds the configured size {@link FlinkOptions#WRITE_BATCH_SIZE} + * or the memory of the binary buffer is exhausted, and could not append any more data or a Flink checkpoint starts. + * After a batch has been written successfully, the function notifies its operator coordinator {@link StreamWriteOperatorCoordinator} + * to mark a successful write. + * + * The Semantics + * + * The task implements exactly-once semantics by buffering the data between checkpoints. The operator coordinator + * starts a new instant on the timeline when a checkpoint triggers, the coordinator checkpoints alwa
Re: [PR] [HUDI-8796] Restrict insert operation with bucket index for Flink [hudi]
geserdugarov commented on PR #12545: URL: https://github.com/apache/hudi/pull/12545#issuecomment-2732158325 I will revisit this issue after major changes in Flink write into Hudi by RFC-87. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8969) Analyze how to write `RowData` directly
[ https://issues.apache.org/jira/browse/HUDI-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-8969: Status: Open (was: In Progress) > Analyze how to write `RowData` directly > --- > > Key: HUDI-8969 > URL: https://issues.apache.org/jira/browse/HUDI-8969 > Project: Apache Hudi > Issue Type: Task >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-9144] Flink writer for MOR table supports writing RowData … [hudi]
hudi-bot commented on PR #12967: URL: https://github.com/apache/hudi/pull/12967#issuecomment-2733240438 ## CI report: * b591ad3b0092eec900475590089dd05f58570d5d Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4222) * 21ba47d8c7b86b1c92a311e216c4b85dc17ed046 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9140] Fix log block io type and other rollback strategy fixes for table version 6 [hudi]
hudi-bot commented on PR #12992: URL: https://github.com/apache/hudi/pull/12992#issuecomment-2733436576 ## CI report: * 99916b679c4915e811845ae6261e9b3263c0feea Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4244) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Fix delete ordering comparison issue [hudi]
linliu-code commented on PR #12979: URL: https://github.com/apache/hudi/pull/12979#issuecomment-2733434279 This is not needed since we have found a better fix: https://github.com/apache/hudi/pull/12991 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Fix delete ordering comparison issue [hudi]
linliu-code closed pull request #12979: [HUDI-9120] Fix delete ordering comparison issue URL: https://github.com/apache/hudi/pull/12979 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732038143 ## CI report: * d85952ac252b3a9cc7677188dac340ece0efcc1d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9013] Add backwards compatible MDT writer support and reader support with tbl v6 [hudi]
hudi-bot commented on PR #12948: URL: https://github.com/apache/hudi/pull/12948#issuecomment-2731725478 ## CI report: * 047885b4286dae609122e6573117cfd5dcdca572 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4221) * 293d1a47c619237651041b9182f414a272f7c5ed UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732053104 ## CI report: * d85952ac252b3a9cc7677188dac340ece0efcc1d Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4242) * 87512b51170f102d612c11b44bea7534a684c51d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732041000 ## CI report: * d85952ac252b3a9cc7677188dac340ece0efcc1d Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4242) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732049001 ## CI report: * d85952ac252b3a9cc7677188dac340ece0efcc1d Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4242) * 87512b51170f102d612c11b44bea7534a684c51d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]
danny0405 commented on issue #12989: URL: https://github.com/apache/hudi/issues/12989#issuecomment-2731982548 Do you want to do some code refactoring or encounter some issues for your use case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]
leeseven1211 commented on issue #12989: URL: https://github.com/apache/hudi/issues/12989#issuecomment-2731993287 While using bulk insert to batch write data into Hudi, I noticed that the written files were not bucketed according to the bucket index. After adding this piece of code, case HoodieIndex.IndexType.BUCKET if writeConfig.getBucketIndexEngineType == BucketIndexEngineType.SIMPLE=> new BucketBulkInsertDataInternalWriterHelper I found that static bucketing could be achieved. I would like to consult with you on why this aspect was not enhanced. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]
hudi-bot commented on PR #12987: URL: https://github.com/apache/hudi/pull/12987#issuecomment-2731998178 ## CI report: * 1969b9f2ad75790d1058e6b66ae0995793c3082d Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4240) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]
danny0405 commented on issue #12988: URL: https://github.com/apache/hudi/issues/12988#issuecomment-2732019149 Are there any other failures in the JM log? Can you also show me the Flink UI operator DAG? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12990: URL: https://github.com/apache/hudi/pull/12990#issuecomment-2732029371 ## CI report: * daa0efee55176b3f6441a960a796322b6adec941 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-9120] Add precombine field if possible [hudi]
linliu-code opened a new pull request, #12991: URL: https://github.com/apache/hudi/pull/12991 ### Change Logs Previously when the configuration "hoodie.record.merge.mode" is null (by default), we will not add precombine field to the required schema since we assume it is commit time ordering. But actually we should treat the case that "hoodie.record.merge.mode" is null as an unknown, and then try to add the precombine field to the required schema as much as possible. Otherwise, for event_time_ordering, it could cause ordering value comparison failure. ### Impact Fix a bug. ### Risk level (write none, low medium or high below) Medium. ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]
danny0405 commented on issue #12989: URL: https://github.com/apache/hudi/issues/12989#issuecomment-2732033960 Can you share you configurations and the code link that you want to put a patch to? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-9195) rowdata write handle builds data block using record iterator if there is no delete record
Shuo Cheng created HUDI-9195: Summary: rowdata write handle builds data block using record iterator if there is no delete record Key: HUDI-9195 URL: https://issues.apache.org/jira/browse/HUDI-9195 Project: Apache Hudi Issue Type: Sub-task Components: flink-sql Reporter: Shuo Cheng If there is no delete record, log write handle only writes data blocks, there is no need to divide records into upsert records and delete records then, we can build data block directly using the record iterator. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]
leeseven1211 opened a new issue, #12989: URL: https://github.com/apache/hudi/issues/12989 use bulk insert , only ConsistentBucketBulkInsertDataInternalWriterHelper and BulkInsertDataInternalWriterHelper ,why is not support BucketBulkInsertDataInternalWriterHelper Below is the code snippet: val writer = writeConfig.getIndexType match { case HoodieIndex.IndexType.BUCKET if writeConfig.getBucketIndexEngineType == BucketIndexEngineType.CONSISTENT_HASHING => new ConsistentBucketBulkInsertDataInternalWriterHelper( table, writeConfig, instantTime, taskPartitionId, taskId, taskEpochId, schema, writeConfig.populateMetaFields, arePartitionRecordsSorted, shouldPreserveHoodieMetadata) // Is it possible to add support here? case _ => new BulkInsertDataInternalWriterHelper( table, writeConfig, instantTime, taskPartitionId, taskId, taskEpochId, schema, writeConfig.populateMetaFields, arePartitionRecordsSorted, shouldPreserveHoodieMetadata) } **Expected behavior** add: case HoodieIndex.IndexType.BUCKET if writeConfig.getBucketIndexEngineType == BucketIndexEngineType.SIMPLE=> new BucketBulkInsertDataInternalWriterHelper ( xxx ) **Environment Description** * Hudi version : 0.14.0 * Spark version : 3.1.1 * Hive version : 3.1.1 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : hdfs * Running on Docker? (yes/no) : no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9170] Fixing schema projection with file group reader [hudi]
hudi-bot commented on PR #12970: URL: https://github.com/apache/hudi/pull/12970#issuecomment-2731183347 ## CI report: * 09b4ba83b5d61cd777c577e483bfe21098725ecc UNKNOWN * b31778b0dd6ccaf619321c6f9b397f7a388c8717 UNKNOWN * d8264536a187f6e213ed1eb08d941c0fc86a1e55 UNKNOWN * 07b8d68c8ebe12c5e8d29d7964f2d82a1f8f1519 Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4230) * 27bdb40132d1be6aa68732a2f147be35a1b03945 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4236) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as max precision 13 [hudi]
imonteroq commented on issue #11335: URL: https://github.com/apache/hudi/issues/11335#issuecomment-2734199198 I am also getting the same issue. Running Hudi 0.15 on EMR serverless using Spark/Scala. I have saved the incoming data to a new table and it has absolutely NO decimal fields with precision 9, it has created them with precision 8. ``` Caused by: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 9 as max precision 8 Caused by: org.apache.hudi.exception.HoodieException: org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 9 as max precision 8 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9013] Add backwards compatible MDT writer support and reader support with tbl v6 [hudi]
nsivabalan commented on PR #12948: URL: https://github.com/apache/hudi/pull/12948#issuecomment-2734247170 CI is failing due to known flaky test ITTestHoodieDataSource. testIncrementalReadArchivedCommits -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]
hudi-bot commented on PR #12987: URL: https://github.com/apache/hudi/pull/12987#issuecomment-2734539112 ## CI report: * 1969b9f2ad75790d1058e6b66ae0995793c3082d Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4240) * 8f98d0ff87fd8d21365696b22af77caac421cdd5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7037) Column Stats for Decimal Field From Metadata table is read as Bytes
[ https://issues.apache.org/jira/browse/HUDI-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7037: - Labels: pull-request-available (was: ) > Column Stats for Decimal Field From Metadata table is read as Bytes > --- > > Key: HUDI-7037 > URL: https://issues.apache.org/jira/browse/HUDI-7037 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Affects Versions: 0.14.1 >Reporter: Vamshi Gudavarthi >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.2 > > > During Onetable project, found that for Decimal field column stats read from > metadata table is read as BytesWrapper instead of DecimalWrapper essentially > the actual type got lost. Verified write side is fine (i.e. writing as > DecimalWrapper) but read side is where the problem is. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2734200114 ## CI report: * 87512b51170f102d612c11b44bea7534a684c51d Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4243) * 5eb43137bc60a32dee13000bff27b3b08e3694d3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
yihua commented on code in PR #12991: URL: https://github.com/apache/hudi/pull/12991#discussion_r2001740210 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java: ## @@ -199,7 +199,8 @@ private static String[] getMandatoryFieldsForMerging(HoodieTableConfig cfg, Type } } -if (cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) { +if (cfg.getRecordMergeMode() == null +|| cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) { Review Comment: A better way is to return `EVENT_TIME_ORDERING` from `cfg.getRecordMergeMode()` for table version 6. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
yihua commented on code in PR #12991: URL: https://github.com/apache/hudi/pull/12991#discussion_r2001740210 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java: ## @@ -199,7 +199,8 @@ private static String[] getMandatoryFieldsForMerging(HoodieTableConfig cfg, Type } } -if (cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) { +if (cfg.getRecordMergeMode() == null +|| cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) { Review Comment: A better way is to return the inferred merge mode, `EVENT_TIME_ORDERING` or `COMMIT_TIME_ORDERING`, from `cfg.getRecordMergeMode()` for table version 6. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2734455487 ## CI report: * 5eb43137bc60a32dee13000bff27b3b08e3694d3 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4249) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2735187205 ## CI report: * 6330602b196e68b6fe9f2e1612dec8590dce073c Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4251) * b5e64be8b802c3d2cb048b11c3c83d1296dc2d41 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4257) * 411b6b0bd0238e770d9454c88d5a1daca0af41a6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [DISCUSSION] Should we treat `COMMIT_TIME_ORDERING` as a special case of `EVENT_TIME_ORDERING` ? [hudi]
TheR1sing3un opened a new issue, #12997: URL: https://github.com/apache/hudi/issues/12997 From the current code structure, we would treat the merge policy of `COMMIT_TIME_ORDERING` as a separate logic, but from a business perspective, should we treat it as a special case of `EVENT_TIME_ORDERING` where the ORDERING VALUE for each record is the same? For example, right now it's represented by `int: 0`. This way we don't need to maintain two merge policies, we default to `EVENT_TIME_ORDERING`, the same record_key merge policy is handled in the order of `transaction_time` and `event_time`. This will help us in the future to maintain the code and deal with the various merge problems we have encountered. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [DISCUSSION] Should we treat `COMMIT_TIME_ORDERING` as a special case of `EVENT_TIME_ORDERING` ? [hudi]
TheR1sing3un commented on issue #12997: URL: https://github.com/apache/hudi/issues/12997#issuecomment-2735198627 @yihua @danny0405 I'd love to hear what you think. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2735188604 ## CI report: * b5e64be8b802c3d2cb048b11c3c83d1296dc2d41 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4257) * 411b6b0bd0238e770d9454c88d5a1daca0af41a6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2735185144 ## CI report: * 6330602b196e68b6fe9f2e1612dec8590dce073c Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4251) * b5e64be8b802c3d2cb048b11c3c83d1296dc2d41 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]
linliu-code commented on code in PR #12991: URL: https://github.com/apache/hudi/pull/12991#discussion_r2002315944 ## hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderSchemaHandler.java: ## @@ -0,0 +1,123 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.common.table.read; + +import org.apache.hudi.common.config.RecordMergeMode; +import org.apache.hudi.common.config.TypedProperties; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.model.HoodieRecordMerger; +import org.apache.hudi.common.table.HoodieTableConfig; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.collection.Triple; + +import org.apache.avro.Schema; +import org.junit.jupiter.api.Test; +import org.junit.jupiter.api.extension.ExtendWith; +import org.mockito.Mock; +import org.mockito.junit.jupiter.MockitoExtension; + +import static org.apache.hudi.common.table.read.HoodieFileGroupReaderSchemaHandler.getMandatoryFieldsForMerging; +import static org.junit.jupiter.api.Assertions.assertArrayEquals; +import static org.mockito.Mockito.any; +import static org.mockito.Mockito.mockStatic; +import static org.mockito.Mockito.times; +import static org.mockito.Mockito.verify; +import static org.mockito.Mockito.when; + +@ExtendWith(MockitoExtension.class) +class TestHoodieFileGroupReaderSchemaHandler { Review Comment: Done. Added a few new test cases there. ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java: ## @@ -199,7 +201,14 @@ private static String[] getMandatoryFieldsForMerging(HoodieTableConfig cfg, Type } } -if (cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) { +Triple mergingConfigs = Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] When writing to a Hudi MOR table using Flink, data merging did not occur based on the expected value of "precombine.field". [hudi]
Toroidals closed issue #12996: [SUPPORT] When writing to a Hudi MOR table using Flink, data merging did not occur based on the expected value of "precombine.field". URL: https://github.com/apache/hudi/issues/12996 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9197] Fix flaky test for flink: testDynamicPartitionPrune [hudi]
hudi-bot commented on PR #12995: URL: https://github.com/apache/hudi/pull/12995#issuecomment-2735191670 ## CI report: * a27ffd4b4687f9fb983d5914f2d060d0ce4f6956 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4254) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] docker demo not working: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/format/TypeDefinedOrder [hudi]
nsivabalan commented on issue #12946: URL: https://github.com/apache/hudi/issues/12946#issuecomment-2735210997 hey @rangareddy : did you try docker demo with 0.15.0 branch. can you report it back once you could get it working successfully on your end. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] How to Suppress the HoodieWriterCommitMessage on each Parquet file it Writes HoodieWriterCommitMessage [hudi]
nsivabalan commented on issue #12854: URL: https://github.com/apache/hudi/issues/12854#issuecomment-2735212873 hey @rangareddy : can you try it out to suppress the logging for the given class of interest and report back. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Caused by: org.apache.hudi.exception.HoodieIOException: Exception create input stream from file: HoodieLogFile{pathStr='hdfs://nameservice1/xxx/.00000056-15ec-459f-bb67-5f8c2b319203_2
nsivabalan commented on issue #12554: URL: https://github.com/apache/hudi/issues/12554#issuecomment-2735221004 hey @ad1happy2go @rangareddy : who is following up here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] - Records deleted with via "hard delete" appear after next commit [hudi]
nsivabalan commented on issue #12833: URL: https://github.com/apache/hudi/issues/12833#issuecomment-2735213212 hey @RuyRoaV : gentle ping. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Does Hudi re-create record level index during an upsert operation? [hudi]
nsivabalan commented on issue #12783: URL: https://github.com/apache/hudi/issues/12783#issuecomment-2735216669 you should see this only in first batch after you enable RLI. once its fully initialized, subsequent batchs should use RLI instead of global simple. but the instantiation of RLI itself could be deferrred if there are pending instants in the data table. We can gauge that from driver logs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] docker demo not working: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/format/TypeDefinedOrder [hudi]
Souldiv commented on issue #12946: URL: https://github.com/apache/hudi/issues/12946#issuecomment-2735217288 hey @rangareddy I have followed the steps outlined [here](https://hudi.apache.org/docs/docker_demo/) I get that error when I try to run the sync tool for hive. I believe it might be an issue with the env var $HUDI_CLASSPATH not being set. I tried running it on prem as well with individual services and it worked when I set that var. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] Log files in Hudi MOR table are not getting deleted [hudi]
nsivabalan commented on issue #12702: URL: https://github.com/apache/hudi/issues/12702#issuecomment-2735217750 hey @ad1happy2go : whats the status on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Hoodie Custom Merge Paylod results in UnsupportedOperationException [hudi]
nsivabalan commented on issue #12571: URL: https://github.com/apache/hudi/issues/12571#issuecomment-2735220428 hey folks, whats the status here. did we find the root cause, or not reproducible. we are trying to collect issues to be targetted for 1.0.2. So, trying to gauge the status of this issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Queries are very memory intensive due to low read parallelism in HoodieMergeOnReadRDD [hudi]
nsivabalan commented on issue #12434: URL: https://github.com/apache/hudi/issues/12434#issuecomment-2735221825 any update on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Slow commit times with Spark Structured Streaming from Kinesis to MOR Hudi table [hudi]
nsivabalan commented on issue #12412: URL: https://github.com/apache/hudi/issues/12412#issuecomment-2735222177 hey @ad1happy2go : whats the latest on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]
hudi-bot commented on PR #12935: URL: https://github.com/apache/hudi/pull/12935#issuecomment-2734044775 ## CI report: * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4246) * 3498922c1b7993f4919f9bb4400fc8a8565ccdac Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4247) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7037) Column Stats for Decimal Field From Metadata table is read as Bytes
[ https://issues.apache.org/jira/browse/HUDI-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-7037: -- Status: Patch Available (was: In Progress) > Column Stats for Decimal Field From Metadata table is read as Bytes > --- > > Key: HUDI-7037 > URL: https://issues.apache.org/jira/browse/HUDI-7037 > Project: Apache Hudi > Issue Type: Sub-task > Components: metadata >Affects Versions: 0.14.1 >Reporter: Vamshi Gudavarthi >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.2 > > > During Onetable project, found that for Decimal field column stats read from > metadata table is read as BytesWrapper instead of DecimalWrapper essentially > the actual type got lost. Verified write side is fine (i.e. writing as > DecimalWrapper) but read side is where the problem is. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]
hudi-bot commented on PR #12993: URL: https://github.com/apache/hudi/pull/12993#issuecomment-2734070157 ## CI report: * c81a532851ea54fcb58262145d016323a1e42ac7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]
codope opened a new pull request, #12993: URL: https://github.com/apache/hudi/pull/12993 ### Change Logs When reading column statistics for Decimal fields from the metadata table, the unwrapped decimal values are incorrectly handled. Specifically: - **Type Loss in Unwrapping:** The `tryUnpackValueWrapper` method was matching a `BytesWrapper` before a `DecimalWrapper`, causing values written as `DecimalWrapper` (which are actually decimals) to be interpreted as raw bytes. - **Incorrect Deserialization:** In the `deserialize` method, the decimal conversion was not robust enough: - It only handled the case where the unwrapped value is a `ByteBuffer`. - It did not enforce the correct scale/precision when the value was already a decimal (either as a Scala `BigDecimal` or a `java.math.BigDecimal`). - Additionally, it was using a private Avro constructor for `Decimal`, which prevented proper conversion. This PR addresses these issues with the following changes: 1. **Reordering in `tryUnpackValueWrapper`:** - The `DecimalWrapper` case is moved before the `BytesWrapper` case. This ensures that values written as decimals are unwrapped correctly. 2. **Enhancements to `deserialize`:** - **ByteBuffer Handling:** The conversion now uses Avro’s public factory method (`org.apache.avro.LogicalTypes.decimal(precision, scale)`) to create a Decimal logical type with the correct precision and scale. - **Direct Decimal Values:** The method now properly handles cases where the unwrapped value is already a `scala.math.BigDecimal` or a `java.math.BigDecimal`. In both cases, it enforces the target scale using `.setScale(dt.scale, java.math.RoundingMode.UNNECESSARY)`. 3. **Unit Tests:** - New unit tests have been added to cover: - Decimal values unwrapped as a `ByteBuffer`. - Decimal values unwrapped as a `java.math.BigDecimal`. - Decimal values unwrapped as a Scala `BigDecimal`. - Additionally, we reuse the existing utility method to generate a DataFrame with decimals to ensure that our logic works correctly in an integrated scenario. ### Impact Decimal values retain their intended semantics when read from the metadata table, ensuring that comparisons and filtering, and hence data skipping, based on these values work correctly. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2734820536 ## CI report: * 5eb43137bc60a32dee13000bff27b3b08e3694d3 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4249) * 6330602b196e68b6fe9f2e1612dec8590dce073c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]
hudi-bot commented on PR #12991: URL: https://github.com/apache/hudi/pull/12991#issuecomment-2734822641 ## CI report: * 5eb43137bc60a32dee13000bff27b3b08e3694d3 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4249) * 6330602b196e68b6fe9f2e1612dec8590dce073c Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4251) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8655) Create Tests for Filegroup reader for Schema Cache and for Spillable Map
[ https://issues.apache.org/jira/browse/HUDI-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-8655: -- Fix Version/s: 1.0.1 (was: 1.0.2) > Create Tests for Filegroup reader for Schema Cache and for Spillable Map > > > Key: HUDI-8655 > URL: https://issues.apache.org/jira/browse/HUDI-8655 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > We need unit tests for schema cache > For spillable map, we need to add test cases for how the fg reader will use > it and ensure we test spilling to disk in the test -- This message was sent by Atlassian Jira (v8.20.10#820010)