hudi.git: Error while running github feature from master:.asf.yaml

2025-03-18 Thread Apache Infrastructure


An error occurred while processing the github feature in .asf.yaml: 

GitHub discussions can only be enabled if a mailing list target exists for it.

---
With regards, ASF Infra.



Re: [I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]

2025-03-18 Thread via GitHub


leeseven1211 commented on issue #12989:
URL: https://github.com/apache/hudi/issues/12989#issuecomment-2735508732

   The code only matches the 
ConsistentBucketBulkInsertDataInternalWriterHelper, and for all cases that do 
not match, it uses the BulkInsertDataInternalWriterHelper. When using 
BucketIndexEngineType.SIMPL, why can't BucketBulkInsertDataInternalWriterHelper 
be used


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12993:
URL: https://github.com/apache/hudi/pull/12993#issuecomment-2735251387

   
   ## CI report:
   
   * ca52bb6677971593da5f246468ce260096c88d8a Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4256)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2735342081

   
   ## CI report:
   
   * 411b6b0bd0238e770d9454c88d5a1daca0af41a6 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4258)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [RFC-92] Pluggable Table Format Support [hudi]

2025-03-18 Thread via GitHub


bvaradar opened a new pull request, #12998:
URL: https://github.com/apache/hudi/pull/12998

   ### Change Logs
   
   Pluggable Table Format Support in Hudi 
   
   ### Impact
   
   Pluggable Table Format Support in Hudi 
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9188] Fixing RLI record generation to account for deletes with lower ordering values in MOR log files [hudi]

2025-03-18 Thread via GitHub


yihua commented on code in PR #12984:
URL: https://github.com/apache/hudi/pull/12984#discussion_r2002025587


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##
@@ -901,6 +919,53 @@ public static HoodieData 
convertMetadataToRecordIndexRecords(Hoodi
 }
   }
 
+  static Set getValidRecordKeysForFileSlice(HoodieTableMetaClient 
metaClient,

Review Comment:
   Does the file group reading now add additional latency compared to before?  
Should we consider optimizations for `EVENT_TIME_ORDERING` that can avoid such 
merging?  Also is the behavior consistent with global index, i.e., once the 
record is deleted through a log file in MOR table, the record no longer belongs 
to the file group even though the record exists in the base file?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9188] Fixing RLI record generation to account for deletes with lower ordering values in MOR log files [hudi]

2025-03-18 Thread via GitHub


yihua commented on code in PR #12984:
URL: https://github.com/apache/hudi/pull/12984#discussion_r2002015927


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala:
##
@@ -541,6 +541,66 @@ class TestMORDataSource extends HoodieSparkClientTestBase 
with SparkDatasetMixin
 assertEquals(0, hudiSnapshotDF3.count()) // 100 records were deleted, 0 
record to load
   }
 
+  @Test
+  def testDeletesWithLowerOrderingValue() : Unit = {

Review Comment:
   Should this test be added to `TestRecordLevelIndex` since it's record index 
specific?



##
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/testutils/DataSourceTestUtils.java:
##
@@ -130,13 +133,24 @@ public static List 
generateRandomRowsEvolvedSchema(int count) {
   }
 
   public static List updateRowsWithHigherTs(Dataset inputDf) {
+return updateRowsWithUpdatedTs(inputDf, false, false);
+  }
+
+  public static List updateRowsWithUpdatedTs(Dataset inputDf, 
Boolean lowerTs, Boolean updatePartitionPath) {
 List input = inputDf.collectAsList();
 List rows = new ArrayList<>();
 for (Row row : input) {
-  Object[] values = new Object[3];
+  Object[] values = new Object[4];

Review Comment:
   The changes should already be merged to master.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12987:
URL: https://github.com/apache/hudi/pull/12987#issuecomment-2734751442

   
   ## CI report:
   
   * 8f98d0ff87fd8d21365696b22af77caac421cdd5 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4250)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]

2025-03-18 Thread via GitHub


yihua commented on code in PR #12993:
URL: https://github.com/apache/hudi/pull/12993#discussion_r2001887764


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala:
##
@@ -490,13 +491,19 @@ object ColumnStatsIndexSupport {
   case ShortType => value.asInstanceOf[Int].toShort
   case ByteType => value.asInstanceOf[Int].toByte
 
-  // TODO fix
-  case _: DecimalType =>
+  case dt: DecimalType =>
 value match {
   case buffer: ByteBuffer =>
-val logicalType = 
DecimalWrapper.SCHEMA$.getField("value").schema().getLogicalType
-decConv.fromBytes(buffer, null, logicalType)
-  case _ => value
+// Use the DecimalType's precision and scale (instead of using the 
schema from DecimalWrapper)

Review Comment:
   My understanding is that this only affects reading the column stats from 
MDT, not writing, so there is no storage byte change.  Correct?



##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/ColumnStatsIndexSupport.scala:
##
@@ -455,10 +456,10 @@ object ColumnStatsIndexSupport {
   case w: LongWrapper => w.getValue
   case w: FloatWrapper => w.getValue
   case w: DoubleWrapper => w.getValue
+  case w: DecimalWrapper => w.getValue  // Moved above BytesWrapper to 
ensure proper matching

Review Comment:
   Do we have functional tests covering the data skipping on a decimal column 
using column stats?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-9140] Fix log block io type and other rollback strategy fixes for table version 6 [hudi]

2025-03-18 Thread via GitHub


lokeshj1703 opened a new pull request, #12992:
URL: https://github.com/apache/hudi/pull/12992

   ### Change Logs
   
   The PR fixes the iotype as APPEND for log blocks in table version 6. It also 
reverts some changes made to MarkerBasedRollbackStrategy for table version 6 in 
HUDI-9030.
   
   ### Impact
   
   NA
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8178] Fix CI failures for partition-stats enablement [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on PR #12081:
URL: https://github.com/apache/hudi/pull/12081#issuecomment-2734569017

   We fixed both Date and LocalDate with col stats and partition stats. 
   
   
https://github.com/apache/hudi/blob/1f43b231763a978bef8d340a654e9f6287241ec9/hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java#L152C55-L152C86
 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9188] Fixing RLI record generation to account for deletes with lower ordering values in MOR log files [hudi]

2025-03-18 Thread via GitHub


yihua commented on code in PR #12984:
URL: https://github.com/apache/hudi/pull/12984#discussion_r2001950535


##
hudi-common/src/main/java/org/apache/hudi/metadata/RecordIndexRecordKeyParsingUtils.java:
##
@@ -33,20 +33,17 @@
 import org.apache.hadoop.fs.Path;
 
 import java.util.ArrayList;
-import java.util.Collection;
 import java.util.Collections;
 import java.util.HashMap;
 import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
-import java.util.function.Function;
-import java.util.stream.Stream;
 
 import static java.util.stream.Collectors.toList;
 
-public class BaseFileRecordParsingUtils {
+public class RecordIndexRecordKeyParsingUtils {

Review Comment:
   nit: rename to `RecordIndexUtils`



##
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestMetadataUtilRLIandSIRecordGeneration.java:
##
@@ -281,9 +282,16 @@ public void testRecordGenerationAPIsForMOR() throws 
IOException {
   assertTrue(compactionInstantOpt.isPresent());
   HoodieWriteMetadata compactionWriteMetadata = 
client.compact(compactionInstantOpt.get());
   HoodieCommitMetadata compactionCommitMetadata = (HoodieCommitMetadata) 
compactionWriteMetadata.getCommitMetadata().get();
-  // no RLI records should be generated for compaction operation.
-  assertTrue(convertMetadataToRecordIndexRecords(context, 
compactionCommitMetadata, writeConfig.getMetadataConfig(),
-  metaClient, writeConfig.getWritesFileIdEncoding(), 
compactionInstantOpt.get(), EngineType.SPARK).isEmpty());
+
+  HoodieBackedTableMetadata tableMetadata = new 
HoodieBackedTableMetadata(engineContext, metaClient.getStorage(), 
writeConfig.getMetadataConfig(), writeConfig.getBasePath(), true);
+  HoodieTableFileSystemView fsView = new 
HoodieTableFileSystemView(tableMetadata, metaClient, 
metaClient.getActiveTimeline());
+  try {

Review Comment:
   try with resources for both `tableMetadata` and `fsView`?



##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java:
##
@@ -1087,20 +1088,27 @@ engineContext, dataWriteConfig, commitMetadata, 
instantTime, dataMetaClient, get
   getMetadataPartitionsToUpdate(), 
dataWriteConfig.getBloomFilterType(),
   dataWriteConfig.getBloomIndexParallelism(), 
dataWriteConfig.getWritesFileIdEncoding(), getEngineType(),
   Option.of(dataWriteConfig.getRecordMerger().getRecordType()));
-
-  // Updates for record index are created by parsing the WriteStatus which 
is a hudi-client object. Hence, we cannot yet move this code
-  // to the HoodieTableMetadataUtil class in hudi-common.
-  if 
(getMetadataPartitionsToUpdate().contains(RECORD_INDEX.getPartitionPath())) {
-HoodieData additionalUpdates = 
getRecordIndexAdditionalUpserts(partitionToRecordMap.get(RECORD_INDEX.getPartitionPath()),
 commitMetadata);
-partitionToRecordMap.put(RECORD_INDEX.getPartitionPath(), 
partitionToRecordMap.get(RECORD_INDEX.getPartitionPath()).union(additionalUpdates));
-  }
+  updateRecordIndexRecordsIfPresent(commitMetadata, instantTime, 
partitionToRecordMap);
   updateExpressionIndexIfPresent(commitMetadata, instantTime, 
partitionToRecordMap);
   updateSecondaryIndexIfPresent(commitMetadata, partitionToRecordMap, 
instantTime);
   return partitionToRecordMap;
 });
 closeInternal();
   }
 
+  private void updateRecordIndexRecordsIfPresent(HoodieCommitMetadata 
commitMetadata, String instantTime, Map> 
partitionToRecordMap) {
+if (!RECORD_INDEX.isMetadataPartitionAvailable(dataMetaClient)) {

Review Comment:
   Should this still follow the same check as before: 
`getMetadataPartitionsToUpdate().contains(RECORD_INDEX.getPartitionPath())`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-9086) Master is broken Feb 27, 2025

2025-03-18 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-9086.
-
Resolution: Fixed

> Master is broken Feb 27, 2025
> -
>
> Key: HUDI-9086
> URL: https://issues.apache.org/jira/browse/HUDI-9086
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: dev-experience
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.2
>
>   Original Estimate: 4h
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Master is broken as of now. 
> {code:java}
> 2025-02-28T00:34:18.4012123Z [ERROR] Tests run: 1, Failures: 0, Errors: 1, 
> Skipped: 0, Time elapsed: 0.293 s <<< FAILURE! - in 
> org.apache.hudi.TestDataSourceUtils
> 2025-02-28T00:34:18.4012821Z [ERROR] 
> testDeduplicationAgainstRecordsAlreadyInTable  Time elapsed: 0.282 s  <<< 
> ERROR!
> 2025-02-28T00:34:18.4013202Z org.apache.spark.SparkException: 
> 2025-02-28T00:34:18.4041057Z Only one SparkContext should be running in this 
> JVM (see SPARK-2243).The currently running SparkContext was created at:
> 2025-02-28T00:34:18.4077789Z 
> org.apache.spark.sql.hive.TestHiveClientUtils.setUp(TestHiveClientUtils.scala:43)
> 2025-02-28T00:34:18.4078473Z 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 2025-02-28T00:34:18.4081698Z 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 2025-02-28T00:34:18.4082148Z 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 2025-02-28T00:34:18.4082478Z java.lang.reflect.Method.invoke(Method.java:498)
> 2025-02-28T00:34:18.4087413Z 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
> 2025-02-28T00:34:18.4088031Z 
> org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
> 2025-02-28T00:34:18.4089261Z 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
> 2025-02-28T00:34:18.4089580Z 
> org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
> 2025-02-28T00:34:18.4089917Z 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:126)
> 2025-02-28T00:34:18.4090239Z 
> org.junit.jupiter.engine.extension.TimeoutExtension.interceptBeforeAllMethod(TimeoutExtension.java:68)
> 2025-02-28T00:34:18.4090561Z 
> org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
> 2025-02-28T00:34:18.4090892Z 
> org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
> 2025-02-28T00:34:18.4091376Z 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
> 2025-02-28T00:34:18.4091705Z 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
> 2025-02-28T00:34:18.4092021Z 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
> 2025-02-28T00:34:18.4092329Z 
> org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
> 2025-02-28T00:34:18.4092617Z 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
> 2025-02-28T00:34:18.4092898Z 
> org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
> 2025-02-28T00:34:18.4093216Z 
> org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$invokeBeforeAllMethods$11(ClassBasedTestDescriptor.java:397)
> 2025-02-28T00:34:18.4093542Z  at 
> org.apache.spark.SparkContext$.$anonfun$assertNoOtherContextIsRunning$2(SparkContext.scala:2840)
> 2025-02-28T00:34:18.4093794Z  at scala.Option.foreach(Option.scala:407)
> 2025-02-28T00:34:18.4094033Z  at 
> org.apache.spark.SparkContext$.assertNoOtherContextIsRunning(SparkContext.scala:2837)
> 2025-02-28T00:34:18.4094305Z  at 
> org.apache.spark.SparkContext$.markPartiallyConstructed(SparkContext.scala:2927)
> 2025-02-28T00:34:18.4094559Z  at 
> org.apache.spark.SparkContext.(SparkContext.scala:99)
> 2025-02-28T00:34:18.4094836Z  at 
> org.apache.hudi.testutils.HoodieSparkClientTestHarness.initSparkContexts(HoodieSparkClientTestHarness.java:203)
> 2025-02-28T00:34:18.4095249Z  at 
> org.apache.hudi.testutils.HoodieSparkClientTestHarness.initSparkContexts(HoodieSparkClientTestHarness.java:229)
> 2025-02-28T00:34:18.4095557Z  at 
> org.apache.hudi.testutils.HoodieSparkClientTestHarness.initResources(HoodieSparkClientTestHarness.java:159)
> 2025-02-28T00:34:18.4095851Z  at 
> org.apache.hudi.testutil

[jira] [Closed] (HUDI-9127) Fix completion time generation to honor the time zone set in table config

2025-03-18 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-9127.
-
Resolution: Fixed

[https://github.com/apache/hudi/commit/9baaed9409a1a5b654e88d50ac6826a96d6169bb]
 

> Fix completion time generation to honor the time zone set in table config
> -
>
> Key: HUDI-9127
> URL: https://issues.apache.org/jira/browse/HUDI-9127
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: writer-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.2
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-9119) Hudi 1.0.1 cannot write MOR tables

2025-03-18 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-9119:
--
Parent: HUDI-8724
Issue Type: Sub-task  (was: Bug)

> Hudi 1.0.1 cannot write MOR tables
> --
>
> Key: HUDI-9119
> URL: https://issues.apache.org/jira/browse/HUDI-9119
> Project: Apache Hudi
>  Issue Type: Sub-task
>Affects Versions: 1.0.1
>Reporter: Shawn Chang
>Priority: Critical
> Fix For: 1.0.2
>
>
> When testing Hudi 1.0.1 on EMR 7.8, I can see issues like below:
> {code:java}
> Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed 
> to update metadata  at 
> org.apache.hudi.client.BaseHoodieClient.writeTableMetadata(BaseHoodieClient.java:282)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:293)
>   at 
> org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:253)
>   at 
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:94)
>   at 
> org.apache.hudi.HoodieSparkSqlWriterInternal.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:999)
>   at 
> org.apache.hudi.HoodieSparkSqlWriterInternal.writeInternal(HoodieSparkSqlWriter.scala:538)
>   at 
> org.apache.hudi.HoodieSparkSqlWriterInternal.$anonfun$write$1(HoodieSparkSqlWriter.scala:193)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:157)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$10(SQLExecution.scala:220)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:220)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:405)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:219)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
>   at 
> org.apache.spark.sql.adapter.BaseSpark3Adapter.sqlExecutionWithNewExecutionId(BaseSpark3Adapter.scala:105)
>   at 
> org.apache.hudi.HoodieSparkSqlWriterInternal.write(HoodieSparkSqlWriter.scala:215)
>   at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:130)  
> at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:185)  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:126)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:157)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$10(SQLExecution.scala:220)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:384)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$9(SQLExecution.scala:220)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:405)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:219)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:83)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:123)
>   at 
> org.apache.spar

Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12987:
URL: https://github.com/apache/hudi/pull/12987#issuecomment-2734875401

   
   ## CI report:
   
   * 8f98d0ff87fd8d21365696b22af77caac421cdd5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12987:
URL: https://github.com/apache/hudi/pull/12987#issuecomment-2734877234

   
   ## CI report:
   
   * 8f98d0ff87fd8d21365696b22af77caac421cdd5 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4250)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-8655) Create Tests for Filegroup reader for Schema Cache and for Spillable Map

2025-03-18 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-8655.
-
Resolution: Fixed

> Create Tests for Filegroup reader for Schema Cache and for Spillable Map
> 
>
> Key: HUDI-8655
> URL: https://issues.apache.org/jira/browse/HUDI-8655
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We need unit tests for schema cache
> For spillable map, we need to add test cases for how the fg reader will use 
> it and ensure we test spilling to disk in the test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[I] [SUPPORT] When writing to a Hudi MOR table using Flink, data merging did not occur based on the expected value of "precombine.field". [hudi]

2025-03-18 Thread via GitHub


Toroidals opened a new issue, #12996:
URL: https://github.com/apache/hudi/issues/12996

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? y
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   Primary Key: id
   precombine.field: ts_ms (ts_ms is a 13-digit timestamp in milliseconds)
   
   Scenario 1:
   The Hudi MOR table contains a record: id=1, version=1, ts_ms=1741022687053.
   A new record is submitted: id=1, version=2, ts_ms=1741022687053 (same ts_ms 
as the existing record).
   The merge behaves as expected, and the final result is:
   id=1, version=2, ts_ms=1741022687053
   Scenario 2:
   The Hudi MOR table contains a record: id=1, version=1, ts_ms=1741022687053.
   Two new records are submitted:
   id=1, version=2, ts_ms=1741022687054 (ts_ms is 1 millisecond greater than 
the existing record).
   id=1, version=3, ts_ms=1741022687054 (same ts_ms as the first new record).
   Expected merge result:
   id=1, version=3, ts_ms=1741022687054 (latest version with the same ts_ms 
should be retained).
   However, sometimes the result is:
   id=1, version=2, ts_ms=1741022687054, which is not expected.
   Issue:
   When multiple records with the same primary key (id) and the same ts_ms are 
submitted in a batch, the merge process does not strictly follow the arrival 
order of the messages. Instead, it appears to randomly pick one of the records 
from the batch.
   
   
   flink conf:
   HoodiePipeline.Builder builder = 
HoodiePipeline.builder(infoMap.get("hudi_table_name"));
   Map options = new HashMap<>();
   options.put(FlinkOptions.DATABASE_NAME.key(), 
infoMap.get("hudi_database_name"));
   options.put(FlinkOptions.TABLE_NAME.key(), infoMap.get("hudi_table_name"));
   options.put(FlinkOptions.PATH.key(), infoMap.get("hudi_hdfs_path"));
   options.put("catalog.path", "hdfs:///apps/hudi/catalog/");
   String hudiFieldMap = infoMap.get("hudi_field_map").toLowerCase(Locale.ROOT);
   ArrayList> fieldList = JSON.parseObject(hudiFieldMap, new 
TypeReference>>() {
   });
   log.info("fieldList: {}",  fieldList.toString());
   for (ArrayList columnList : fieldList) {
   builder.column("`" + columnList.get(0) + "` " + columnList.get(1));
   }
   String[] hudiPrimaryKeys = infoMap.get("hudi_primary_key").split(",");
   builder.pk(hudiPrimaryKeys);
   
   options.put(FlinkOptions.PRECOMBINE_FIELD.key(), "ts_ms");
   **options.put(FlinkOptions.PAYLOAD_CLASS_NAME.key(), 
EventTimeAvroPayload.class.getName());
   options.put(FlinkOptions.RECORD_MERGER_IMPLS.key(), 
HoodieAvroRecordMerger.class.getName());**
   
   options.put(FlinkOptions.TABLE_TYPE.key(), 
HoodieTableType.MERGE_ON_READ.name());
   options.put(FlinkOptions.INDEX_TYPE.key(), 
HoodieIndex.IndexType.BUCKET.name());
   options.put(FlinkOptions.BUCKET_INDEX_NUM_BUCKETS.key(), 
infoMap.get("hudi_bucket_index_num_buckets"));
   options.put(FlinkOptions.BUCKET_INDEX_ENGINE_TYPE.key(), 
infoMap.get("hudi_bucket_index_engine_type"));
   
   options.put(FlinkOptions.COMPACTION_TRIGGER_STRATEGY.key(), 
infoMap.get("hudi_compaction_trigger_strategy"));
   options.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), 
infoMap.get("hudi_compaction_delta_commits"));
   options.put(FlinkOptions.COMPACTION_DELTA_SECONDS.key(), 
infoMap.get("hudi_compaction_delta_seconds"));
   options.put(FlinkOptions.COMPACTION_MAX_MEMORY.key(), 
infoMap.get("hudi_compaction_max_memory"));
   
   options.put(HoodieWriteConfig.ALLOW_EMPTY_COMMIT.key(), "true");
   options.put(FlinkOptions.CLEAN_RETAIN_COMMITS.key(), "150");
   options.put(FlinkOptions.HIVE_SYNC_ENABLED.key(), "true");
   options.put(FlinkOptions.HIVE_SYNC_MODE.key(), "hms");
   options.put(FlinkOptions.HIVE_SYNC_DB.key(), "hudi");
   options.put(FlinkOptions.HIVE_SYNC_TABLE.key(), "mor_test_01");
   options.put(FlinkOptions.HIVE_SYNC_CONF_DIR.key(), "/etc/hive/conf");
   options.put(FlinkOptions.HIVE_SYNC_METASTORE_URIS.key(), 
"thrift://xx01:9083,thrift://xx02:9083,thrift://xx03:9083");
   options.put(FlinkOptions.HIVE_SYNC_JDBC_URL.key(), 
"jdbc:hive2://xx01:21181,xx02:21181,xx03:21181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2");
   options.put(FlinkOptions.HIVE_SYNC_SUPPORT_TIMESTAMP.key(), "true");
   options.put(FlinkOptions.HIVE_SYNC_SKIP_RO_SUFFIX.key(), "true");
   
   options.put(FlinkOptions.PARTITION_PATH_FIELD.key(), "part_dt");
   options.put(FlinkOptions.HIVE_SYNC_PARTITION_FIELDS.key(), "part_dt");
   
   options.put(FlinkOptions.WRITE_RATE_LIMIT.key(), "2");
   
   options.put(FlinkOptions.WRITE_TASKS.key(), 8);
   
   options.put(FlinkOptions.OPERATION.key(), WriteOperationType.UPSERT.value());
   
   builder.options(options);
   return builder;
   
   
   **To Reproduce**
   

Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


lokeshj1703 commented on PR #12935:
URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733818315

   @linliu-code @yihua @nsivabalan The PR now cherry-picks Lin's fix and 
removes all the older fixes which were added earlier. It also reverts the 
changes made in HUDI-9030 for removing FGR.
   PR needs Lin's fix otherwise the tests would fail.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12935:
URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733830603

   
   ## CI report:
   
   * d5f94e7afb5865449c7796b03cc4b9c786061ec2 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4152)
 
   * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4246)
 
   * 3498922c1b7993f4919f9bb4400fc8a8565ccdac UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12935:
URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733835632

   
   ## CI report:
   
   * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4246)
 
   * 3498922c1b7993f4919f9bb4400fc8a8565ccdac UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8581] Test schema handler in fg reader and some refactoring to prevent bugs in the future [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on code in PR #12340:
URL: https://github.com/apache/hudi/pull/12340#discussion_r2001378215


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##
@@ -165,7 +165,7 @@ class 
SparkFileFormatInternalRowReaderContext(parquetFileReader: SparkParquetRea
 HoodieAvroUtils.removeFields(skeletonRequiredSchema, rowIndexColumn))
 
   //If we need to do position based merging with log files we will leave 
the row index column at the end
-  val dataProjection = if (getHasLogFiles && 
getShouldMergeUseRecordPosition) {
+  val dataProjection = if (getShouldMergeUseRecordPosition) {

Review Comment:
   why removed the log files check? 



##
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestSchemaHandler.java:
##
@@ -0,0 +1,464 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.table.read;
+
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.config.RecordMergeMode;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.engine.HoodieReaderContext;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordMerger;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.testutils.HoodieTestDataGenerator;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.storage.HoodieStorage;
+import org.apache.hudi.storage.StoragePath;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+import org.apache.avro.generic.IndexedRecord;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.Arguments;
+import org.junit.jupiter.params.provider.MethodSource;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.function.UnaryOperator;
+import java.util.stream.Stream;
+
+import static 
org.apache.hudi.common.config.RecordMergeMode.COMMIT_TIME_ORDERING;
+import static org.apache.hudi.common.config.RecordMergeMode.CUSTOM;
+import static 
org.apache.hudi.common.config.RecordMergeMode.EVENT_TIME_ORDERING;
+import static 
org.apache.hudi.common.table.read.HoodiePositionBasedSchemaHandler.addPositionalMergeCol;
+import static 
org.apache.hudi.common.table.read.HoodiePositionBasedSchemaHandler.getPositionalMergeField;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+import static org.mockito.Mockito.mock;
+import static org.mockito.Mockito.when;
+
+public class TestSchemaHandler {
+
+  protected static final Schema DATA_SCHEMA = 
HoodieAvroUtils.addMetadataFields(HoodieTestDataGenerator.AVRO_SCHEMA);
+  protected static final Schema DATA_COLS_ONLY_SCHEMA = 
generateProjectionSchema("begin_lat", "tip_history", "rider");
+  protected static final Schema META_COLS_ONLY_SCHEMA = 
generateProjectionSchema("_hoodie_commit_seqno", "_hoodie_record_key");
+
+  @Test
+  public void testCow() {
+HoodieReaderContext readerContext = new MockReaderContext(false);
+readerContext.setHasLogFiles(false);
+readerContext.setHasBootstrapBaseFile(false);
+readerContext.setShouldMergeUseRecordPosition(false);
+HoodieTableConfig hoodieTableConfig = mock(HoodieTableConfig.class);
+Schema requestedSchema = DATA_SCHEMA;
+HoodieFileGroupReaderSchemaHandler schemaHandler = new 
HoodieFileGroupReaderSchemaHandler(readerContext, DATA_SCHEMA,
+requestedSchema, Option.empty(), hoodieTableConfig, new 
TypedProperties());
+assertEquals(requestedSchema, schemaHandler.getRequiredSchema());
+
+//read subset of columns
+requestedSchema = generateProjectionSchema("begin_lat", "tip_history", 
"rider");
+schemaHandler =
+new HoodieFileGroupReaderSchemaHandler(readerContext, DATA_SCHEMA, 
requestedSchema,
+Option.empty(), hoodieTableConfig, new TypedP

Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732226321

   
   ## CI report:
   
   * d85952ac252b3a9cc7677188dac340ece0efcc1d Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4242)
 
   * 87512b51170f102d612c11b44bea7534a684c51d Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4243)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9013] Add backwards compatible MDT writer support and reader support with tbl v6 [hudi]

2025-03-18 Thread via GitHub


lokeshj1703 commented on PR #12948:
URL: https://github.com/apache/hudi/pull/12948#issuecomment-2732856286

   Azure CI passed: 
   https://github.com/user-attachments/assets/05ba161a-8999-42a8-b125-f9e2a5a9cef6";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-8969) Analyze how to write `RowData` directly

2025-03-18 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov closed HUDI-8969.
---
Resolution: Fixed

This task will be done under RFC-87.

> Analyze how to write `RowData` directly
> ---
>
> Key: HUDI-8969
> URL: https://issues.apache.org/jira/browse/HUDI-8969
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


lokeshj1703 commented on code in PR #12935:
URL: https://github.com/apache/hudi/pull/12935#discussion_r2001484794


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/SparkBroadcastManager.java:
##
@@ -71,6 +73,7 @@ public class SparkBroadcastManager extends 
EngineBroadcastManager {
   public SparkBroadcastManager(HoodieEngineContext context, 
HoodieTableMetaClient metaClient) {
 this.context = context;
 this.metaClient = metaClient;
+this.tableVersion = metaClient.getTableConfig().getTableVersion();

Review Comment:
   This has been removed now after Lin's fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9144] Flink writer for MOR table supports writing RowData … [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12967:
URL: https://github.com/apache/hudi/pull/12967#issuecomment-2733608704

   
   ## CI report:
   
   * 21ba47d8c7b86b1c92a311e216c4b85dc17ed046 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4245)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12935:
URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733817403

   
   ## CI report:
   
   * d5f94e7afb5865449c7796b03cc4b9c786061ec2 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4152)
 
   * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4246)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12935:
URL: https://github.com/apache/hudi/pull/12935#issuecomment-2733813222

   
   ## CI report:
   
   * d5f94e7afb5865449c7796b03cc4b9c786061ec2 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4152)
 
   * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-9140) Follow up from 9030

2025-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-9140:
-
Labels: pull-request-available  (was: )

> Follow up from 9030
> ---
>
> Key: HUDI-9140
> URL: https://issues.apache.org/jira/browse/HUDI-9140
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: sivabalan narayanan
>Assignee: Lokesh Jain
>Priority: Major
>  Labels: pull-request-available
>
> Filing a tracking ticket for all follow ups from HUDI-9030
>  
> [https://github.com/apache/hudi/pull/12888/files#r1985816074]
> [https://github.com/apache/hudi/pull/12888/files#r1985858219]
> [https://github.com/apache/hudi/pull/12888/files#r1985859138]
> Any changes to LogRecordScanner classes. not really required to be fixed 
> right away. 
> 2. Listing based rollback strategy. 
> 3. Add tests to restore to a commit w/ long history w/ a mix of DC, 
> compaction, clustering N no of times. all inline table service would do. But 
> restore should succeed and data validation should remain intact. 
> 4. Fix the IOType as Append for log files with table version 6
> 5. We will need to check if configs addded in version 1.0 are required for 
> tbl version 6
> 6. Ensure these are accounted for in FGR 
> [https://github.com/apache/hudi/pull/12888/files#r1976856670] 
>  
>  
> WIP patch: 
> [https://github.com/nsivabalan/hudi/tree/fixTableVersion6FixesAbstraction]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-9140] Fix log block io type and other rollback strategy fixes for table version 6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12992:
URL: https://github.com/apache/hudi/pull/12992#issuecomment-2733101341

   
   ## CI report:
   
   * 99916b679c4915e811845ae6261e9b3263c0feea UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9140] Fix log block io type and other rollback strategy fixes for table version 6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12992:
URL: https://github.com/apache/hudi/pull/12992#issuecomment-2733104919

   
   ## CI report:
   
   * 99916b679c4915e811845ae6261e9b3263c0feea Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4244)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-18 Thread via GitHub


maheshguptags opened a new issue, #12988:
URL: https://github.com/apache/hudi/issues/12988

   
   I am experiencing an issue when trying to delete records from a Hudi table 
where data is ingested using Flink streaming, and deletion is attempted using a 
Hudi batch processing job. Despite specifying a partition condition in the 
DELETE query, Hudi scans all partitions, which causes high resource usage and 
timeouts.
   
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   1. Continuously ingest the data using hudi-flink streaming job.
   2. Create hudi table with below config(another batch stream to delete the 
data from same table)
   ```
   CREATE TABLE IF NOT EXISTS hudi_temp(x STRING,_date STRING,_count 
BIGINT,type STRING,update_date TIMESTAMP(3)) PARTITIONED BY (`x`) WITH 
('connector' = 'hudi', 
'hoodie.datasource.write.recordkey.field'='x,_date','path' = 
'${bucket_path_daily}','table.type' = 
'COPY_ON_WRITE','hoodie.datasource.write.precombine.field'='updated_date','write.operation'
 = 
'delete','hoodie.datasource.write.partitionpath.field'='x','hoodie.write.concurrency.mode'='optimistic_concurrency_control','hoodie.write.lock.provider'='org.apache.hudi.client.transaction.lock.InProcessLockProvider','hoodie.cleaner.policy.failed.writes'='LAZY')");
   
   EnvironmentSettings settings = 
EnvironmentSettings.newInstance().inBatchMode().build();
   
   TableEnvironment tEnv = TableEnvironment.create(settings);
   
   tEnv.executeSql(createDeleteTableDDL);
   tEnv.executeSql("DELETE FROM daily_activity_summary where x 
='cl-278'").await();
   tEnv.executeSql("SELECT * FROM Orders where x='cl-278'").print();
   ``` 
   
   3. Deploy the Delete jobs 
   4. Followed below documents for doing same 
https://github.com/apache/flink/blob/release-1.20/docs/content/docs/dev/table/sql/delete.md
   
   **Expected behavior**
   
   Hudi should only scan the relevant partition (x = 'cl-278') when performing 
a DELETE operation, thereby reducing resource usage and preventing timeouts.
   it should delete the specific partition or specific conditions that are 
mentioned in step 1.
   
   **Environment Description**
   
   * Hudi version : 0.15.0
   
   * Spark version : NO
   * Flink version : 1.18.1
   * ENV : k8s
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : k8s
   
   
   **Additional context**
   
   I found there are multiple issue like.
   1. why is it scanning all the partition even if I am giving partition 
details and condition in query it self.
   2. I see there is pruner is getting called but still scanning all the data
   ```
   2025-03-03 10:47:33,191 INFO  org.apache.hudi.util.StreamerUtil  [] - Table 
option [hoodie.datasource.write.keygenerator.class] is reset to 
org.apache.hudi.keygen.ComplexAvroKeyGenerator because record key or partition 
path has two or more fields
   2025-03-03 10:47:36,293 INFO  org.apache.hudi.table.HoodieTableSource [] - 
Partition pruner for hoodie source, condition is:
   equals(client_id, 'cl-278')
   ```
   
   **Stacktrace**
   
   
   For single record in cl-278 it is taking 10 min and still it is not deleting 
and getting below exception
   ```
   Caused by: org.apache.hudi.exception.HoodieException: Timeout(601000ms) 
while waiting for instant initialize
at org.apache.hudi.sink.utils.TimeWait.waitFor(TimeWait.java:57)
at 
org.apache.hudi.sink.common.AbstractStreamWriteFunction.instantToWrite(AbstractStreamWriteFunction.java:269)
at 
org.apache.hudi.sink.StreamWriteFunction.flushRemaining(StreamWriteFunction.java:452)
at 
org.apache.hudi.sink.StreamWriteFunction.endInput(StreamWriteFunction.java:157)
at 
org.apache.hudi.sink.common.AbstractWriteOperator.endInput(AbstractWriteOperator.java:48)
at 
org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.endOperatorInput(StreamOperatorWrapper.java:96)
at 
org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.endInput(RegularOperatorChain.java:97)
at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
at 
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
at 
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.base/java.lang.Thread.run(Unknown Source)
   ```
   Fail

Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


lokeshj1703 commented on code in PR #12935:
URL: https://github.com/apache/hudi/pull/12935#discussion_r2001501987


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java:
##
@@ -112,6 +112,7 @@ public HoodieFileGroupReader(HoodieReaderContext 
readerContext,
   mergeStrategyId, null, tableConfig.getTableVersion());
   recordMergeMode = triple.getLeft();
   mergeStrategyId = triple.getRight();
+  tableConfig.setValue(HoodieTableConfig.RECORD_MERGE_MODE.key(), 
recordMergeMode.name());

Review Comment:
   This change has been removed after Lin's fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7037) Column Stats for Decimal Field From Metadata table is read as Bytes

2025-03-18 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7037:
--
Priority: Blocker  (was: Critical)

> Column Stats for Decimal Field From Metadata table is read as Bytes
> ---
>
> Key: HUDI-7037
> URL: https://issues.apache.org/jira/browse/HUDI-7037
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Affects Versions: 0.14.1
>Reporter: Vamshi Gudavarthi
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 1.0.2
>
>
> During Onetable project, found that for Decimal field column stats read from 
> metadata table is read as BytesWrapper instead of DecimalWrapper essentially 
> the actual type got lost. Verified write side is fine (i.e. writing as 
> DecimalWrapper) but read side is where the problem is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7037) Column Stats for Decimal Field From Metadata table is read as Bytes

2025-03-18 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7037:
--
Status: In Progress  (was: Open)

> Column Stats for Decimal Field From Metadata table is read as Bytes
> ---
>
> Key: HUDI-7037
> URL: https://issues.apache.org/jira/browse/HUDI-7037
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Affects Versions: 0.14.1
>Reporter: Vamshi Gudavarthi
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 1.0.2
>
>
> During Onetable project, found that for Decimal field column stats read from 
> metadata table is read as BytesWrapper instead of DecimalWrapper essentially 
> the actual type got lost. Verified write side is fine (i.e. writing as 
> DecimalWrapper) but read side is where the problem is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12993:
URL: https://github.com/apache/hudi/pull/12993#issuecomment-2734076458

   
   ## CI report:
   
   * c81a532851ea54fcb58262145d016323a1e42ac7 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4248)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9144] Flink writer for MOR table supports writing RowData … [hudi]

2025-03-18 Thread via GitHub


cshuo commented on code in PR #12967:
URL: https://github.com/apache/hudi/pull/12967#discussion_r2000890050


##
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/io/v2/HandleRecords.java:
##
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.io.v2;
+
+import org.apache.hudi.common.model.DeleteRecord;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.util.Option;
+
+import java.util.Collections;
+import java.util.Iterator;
+
+/**
+ * {@code HandleRecords} is a holder containing records iterator for {@code 
HoodieDataBlock}
+ * and delete records iterator for {@code HoodieDeleteBlock}.
+ *
+ * Insert records and delete records are separated using two iterators for 
more efficient
+ * memory utilization, for example, the data bytes in the iterator are reused 
based Flink managed
+ * memory pool, and the RowData wrapper is also a singleton reusable object to 
minimize on-heap
+ * memory costs, thus being more GC friendly for massive data scenarios.
+ */
+public class HandleRecords {
+  private final Iterator recordItr;
+  private final Option> deleteRecordItr;
+
+  public HandleRecords(Iterator recordItr, 
Iterator deleteItr) {
+this.recordItr = recordItr;
+this.deleteRecordItr = Option.ofNullable(deleteItr);
+  }
+
+  public Iterator getRecordItr() {
+return this.recordItr;
+  }
+
+  public Iterator getDeleteRecordItr() {
+return this.deleteRecordItr.orElse(Collections.emptyIterator());
+  }
+
+  public static Builder builder() {
+return new Builder();
+  }
+
+  public static class Builder {
+private Iterator recordItr;
+private Iterator deleteRecordItr;
+
+public Builder() {
+}
+
+public Builder withRecordItr(Iterator recordItr) {
+  this.recordItr = recordItr;
+  return this;
+}
+
+public Builder withDeleteRecordItr(Iterator deleteRecordItr) 
{
+  this.deleteRecordItr = deleteRecordItr;
+  return this;
+}
+
+public HandleRecords build() {
+  return new HandleRecords(recordItr, deleteRecordItr);
+}
+  }

Review Comment:
   `HandleRecords` will be removed after discussing with Danny, see detail 
[here](https://github.com/apache/hudi/pull/12967#discussion_r2000357105).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7930] Flink Support for Array of Row and Map of Row value [hudi]

2025-03-18 Thread via GitHub


David-N-Perkins commented on PR #11727:
URL: https://github.com/apache/hudi/pull/11727#issuecomment-2732958024

   @empcl If I remember correctly, it was needed to get consistent names and 
structure in the Parquet files. I was seeing differences depending on whether 
the operation was "insert", "upsert", or "bulk_insert". 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


lokeshj1703 commented on code in PR #12935:
URL: https://github.com/apache/hudi/pull/12935#discussion_r2001485609


##
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala:
##
@@ -59,17 +60,16 @@ import scala.collection.mutable
  * @param filters   spark filters that might be pushed down into the 
reader
  * @param requiredFilters   filters that are required and should always be 
used, even in merging situations
  */
-class SparkFileFormatInternalRowReaderContext(parquetFileReader: 
SparkParquetReader,
-  filters: Seq[Filter],
-  requiredFilters: Seq[Filter]) 
extends BaseSparkInternalRowReaderContext {
+class SparkFileFormatInternalRowReaderContext(parquetFileReader: 
SparkParquetReader, filters: Seq[Filter],
+  requiredFilters: Seq[Filter], 
tableVersion: HoodieTableVersion) extends BaseSparkInternalRowReaderContext {
   lazy val sparkAdapter: SparkAdapter = SparkAdapterSupport.sparkAdapter
   private lazy val bootstrapSafeFilters: Seq[Filter] = 
filters.filter(filterIsSafeForBootstrap) ++ requiredFilters
   private val deserializerMap: mutable.Map[Schema, HoodieAvroDeserializer] = 
mutable.Map()
   private val serializerMap: mutable.Map[Schema, HoodieAvroSerializer] = 
mutable.Map()
   private lazy val allFilters = filters ++ requiredFilters
 
   override def supportsParquetRowIndex: Boolean = {
-HoodieSparkUtils.gteqSpark3_5
+HoodieSparkUtils.gteqSpark3_5 && 
tableVersion.greaterThanOrEquals(HoodieTableVersion.EIGHT)

Review Comment:
   This change has been removed after Lin's fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-9043) Analyze possibility to optimize `FlinkWriteHelper::deduplicateRecords`

2025-03-18 Thread Geser Dugarov (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17936425#comment-17936425
 ] 

Geser Dugarov commented on HUDI-9043:
-

{color:#00}`RowDataStreamWriteFunction{color}::deduplicateRecordsIfNeeded` 
should be completed first now, and then we could check costs.

> Analyze possibility to optimize `FlinkWriteHelper::deduplicateRecords`
> --
>
> Key: HUDI-9043
> URL: https://issues.apache.org/jira/browse/HUDI-9043
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Major
>
> `FlinkWriteHelper::deduplicateRecords` looks like too costly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-9043) Analyze possibility to optimize `FlinkWriteHelper::deduplicateRecords`

2025-03-18 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-9043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-9043:

Status: Open  (was: In Progress)

> Analyze possibility to optimize `FlinkWriteHelper::deduplicateRecords`
> --
>
> Key: HUDI-9043
> URL: https://issues.apache.org/jira/browse/HUDI-9043
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Major
>
> `FlinkWriteHelper::deduplicateRecords` looks like too costly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732680960

   
   ## CI report:
   
   * 87512b51170f102d612c11b44bea7534a684c51d Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4243)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8796) Silent ignoring of bucket index in Flink append mode

2025-03-18 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-8796:

Status: Open  (was: In Progress)

> Silent ignoring of bucket index in Flink append mode
> 
>
> Key: HUDI-8796
> URL: https://issues.apache.org/jira/browse/HUDI-8796
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.1.0
>
>
> Currently, there is no exception when we try to write data in Flink append 
> mode using bucket index. Data will be written, but in parquet files without 
> bucket IDs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-9144] Flink writer for MOR table supports writing RowData … [hudi]

2025-03-18 Thread via GitHub


Alowator commented on code in PR #12967:
URL: https://github.com/apache/hudi/pull/12967#discussion_r2000181537


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/RowDataStreamWriteFunction.java:
##
@@ -0,0 +1,563 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.sink;
+
+import org.apache.hudi.client.FlinkTaskContextSupplier;
+import org.apache.hudi.client.WriteStatus;
+import org.apache.hudi.client.model.HoodieFlinkInternalRow;
+import org.apache.hudi.client.model.HoodieFlinkRecord;
+import org.apache.hudi.common.model.DeleteRecord;
+import org.apache.hudi.common.model.HoodieKey;
+import org.apache.hudi.common.model.HoodieOperation;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordMerger;
+import org.apache.hudi.common.model.WriteOperationType;
+import org.apache.hudi.common.util.HoodieRecordUtils;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.ValidationUtils;
+import org.apache.hudi.common.util.VisibleForTesting;
+import org.apache.hudi.common.util.collection.MappingIterator;
+import org.apache.hudi.configuration.FlinkOptions;
+import org.apache.hudi.configuration.OptionsResolver;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.io.v2.HandleRecords;
+import org.apache.hudi.metrics.FlinkStreamWriteMetrics;
+import org.apache.hudi.sink.buffer.MemorySegmentPoolFactory;
+import org.apache.hudi.sink.buffer.RowDataBucket;
+import org.apache.hudi.sink.buffer.TotalSizeTracer;
+import org.apache.hudi.sink.bulk.RowDataKeyGen;
+import org.apache.hudi.sink.common.AbstractStreamWriteFunction;
+import org.apache.hudi.sink.event.WriteMetadataEvent;
+import org.apache.hudi.sink.exception.MemoryPagesExhaustedException;
+import org.apache.hudi.sink.utils.BufferUtils;
+import org.apache.hudi.table.action.commit.BucketInfo;
+import org.apache.hudi.table.action.commit.BucketType;
+import org.apache.hudi.util.MutableIteratorWrapperIterator;
+import org.apache.hudi.util.PreCombineFieldExtractor;
+import org.apache.hudi.util.StreamerUtil;
+
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.metrics.MetricGroup;
+import org.apache.flink.streaming.api.functions.ProcessFunction;
+import org.apache.flink.table.data.GenericRowData;
+import org.apache.flink.table.data.RowData;
+import org.apache.flink.table.data.StringData;
+import org.apache.flink.table.data.TimestampData;
+import org.apache.flink.table.data.binary.BinaryRowData;
+import org.apache.flink.table.data.utils.JoinedRowData;
+import org.apache.flink.table.runtime.operators.sort.BinaryInMemorySortBuffer;
+import org.apache.flink.table.runtime.util.MemorySegmentPool;
+import org.apache.flink.table.types.logical.LogicalType;
+import org.apache.flink.table.types.logical.RowType;
+import org.apache.flink.types.RowKind;
+import org.apache.flink.util.Collector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.NoSuchElementException;
+import java.util.concurrent.atomic.AtomicLong;
+
+/**
+ * Sink function to write the data to the underneath filesystem.
+ *
+ * Work Flow
+ *
+ * The function firstly buffers the data (RowData) in a binary buffer based 
on {@code BinaryInMemorySortBuffer}.
+ * It flushes(write) the records batch when the batch size exceeds the 
configured size {@link FlinkOptions#WRITE_BATCH_SIZE}
+ * or the memory of the binary buffer is exhausted, and could not append any 
more data or a Flink checkpoint starts.
+ * After a batch has been written successfully, the function notifies its 
operator coordinator {@link StreamWriteOperatorCoordinator}
+ * to mark a successful write.
+ *
+ * The Semantics
+ *
+ * The task implements exactly-once semantics by buffering the data between 
checkpoints. The operator coordinator
+ * starts a new instant on the timeline when a checkpoint triggers, the 
coordinator checkpoints alwa

Re: [PR] [HUDI-8796] Restrict insert operation with bucket index for Flink [hudi]

2025-03-18 Thread via GitHub


geserdugarov commented on PR #12545:
URL: https://github.com/apache/hudi/pull/12545#issuecomment-2732158325

   I will revisit this issue after major changes in Flink write into Hudi by 
RFC-87.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8969) Analyze how to write `RowData` directly

2025-03-18 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-8969:

Status: Open  (was: In Progress)

> Analyze how to write `RowData` directly
> ---
>
> Key: HUDI-8969
> URL: https://issues.apache.org/jira/browse/HUDI-8969
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-9144] Flink writer for MOR table supports writing RowData … [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12967:
URL: https://github.com/apache/hudi/pull/12967#issuecomment-2733240438

   
   ## CI report:
   
   * b591ad3b0092eec900475590089dd05f58570d5d Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4222)
 
   * 21ba47d8c7b86b1c92a311e216c4b85dc17ed046 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9140] Fix log block io type and other rollback strategy fixes for table version 6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12992:
URL: https://github.com/apache/hudi/pull/12992#issuecomment-2733436576

   
   ## CI report:
   
   * 99916b679c4915e811845ae6261e9b3263c0feea Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4244)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Fix delete ordering comparison issue [hudi]

2025-03-18 Thread via GitHub


linliu-code commented on PR #12979:
URL: https://github.com/apache/hudi/pull/12979#issuecomment-2733434279

   This is not needed since we have found a better fix: 
https://github.com/apache/hudi/pull/12991


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Fix delete ordering comparison issue [hudi]

2025-03-18 Thread via GitHub


linliu-code closed pull request #12979: [HUDI-9120] Fix delete ordering 
comparison issue
URL: https://github.com/apache/hudi/pull/12979


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732038143

   
   ## CI report:
   
   * d85952ac252b3a9cc7677188dac340ece0efcc1d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9013] Add backwards compatible MDT writer support and reader support with tbl v6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12948:
URL: https://github.com/apache/hudi/pull/12948#issuecomment-2731725478

   
   ## CI report:
   
   * 047885b4286dae609122e6573117cfd5dcdca572 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4221)
 
   * 293d1a47c619237651041b9182f414a272f7c5ed UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732053104

   
   ## CI report:
   
   * d85952ac252b3a9cc7677188dac340ece0efcc1d Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4242)
 
   * 87512b51170f102d612c11b44bea7534a684c51d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732041000

   
   ## CI report:
   
   * d85952ac252b3a9cc7677188dac340ece0efcc1d Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4242)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2732049001

   
   ## CI report:
   
   * d85952ac252b3a9cc7677188dac340ece0efcc1d Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4242)
 
   * 87512b51170f102d612c11b44bea7534a684c51d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]

2025-03-18 Thread via GitHub


danny0405 commented on issue #12989:
URL: https://github.com/apache/hudi/issues/12989#issuecomment-2731982548

   Do you want to do some code refactoring or encounter some issues for your 
use case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]

2025-03-18 Thread via GitHub


leeseven1211 commented on issue #12989:
URL: https://github.com/apache/hudi/issues/12989#issuecomment-2731993287

   While using bulk insert to batch write data into Hudi, I noticed that the 
written files were not bucketed according to the bucket index. After adding 
this piece of code, 
   
   case HoodieIndex.IndexType.BUCKET if writeConfig.getBucketIndexEngineType
   == BucketIndexEngineType.SIMPLE=>
   new BucketBulkInsertDataInternalWriterHelper
   
   I found that static bucketing could be achieved. I would like to consult 
with you on why this aspect was not enhanced. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12987:
URL: https://github.com/apache/hudi/pull/12987#issuecomment-2731998178

   
   ## CI report:
   
   * 1969b9f2ad75790d1058e6b66ae0995793c3082d Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4240)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hudi DELETE operation in Flink scans all partitions despite partition predicate [hudi]

2025-03-18 Thread via GitHub


danny0405 commented on issue #12988:
URL: https://github.com/apache/hudi/issues/12988#issuecomment-2732019149

   Are there any other failures in the JM log? Can you also show me the Flink 
UI operator DAG?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12990:
URL: https://github.com/apache/hudi/pull/12990#issuecomment-2732029371

   
   ## CI report:
   
   * daa0efee55176b3f6441a960a796322b6adec941 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


linliu-code opened a new pull request, #12991:
URL: https://github.com/apache/hudi/pull/12991

   ### Change Logs
   
   Previously when the configuration "hoodie.record.merge.mode" is null (by 
default), we will not add precombine field to the required schema since we 
assume it is commit time ordering.
   
   But actually we should treat the case that "hoodie.record.merge.mode" is 
null as an unknown, and then try to add the precombine field to the required 
schema as much as possible. Otherwise, for event_time_ordering, it could cause 
ordering value comparison failure.
   
   ### Impact
   
   Fix a bug.
   
   ### Risk level (write none, low medium or high below)
   
   Medium.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]

2025-03-18 Thread via GitHub


danny0405 commented on issue #12989:
URL: https://github.com/apache/hudi/issues/12989#issuecomment-2732033960

   Can you share you configurations and the code link that you want to put a 
patch to?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-9195) rowdata write handle builds data block using record iterator if there is no delete record

2025-03-18 Thread Shuo Cheng (Jira)
Shuo Cheng created HUDI-9195:


 Summary: rowdata write handle builds data block using record 
iterator if there is no delete record
 Key: HUDI-9195
 URL: https://issues.apache.org/jira/browse/HUDI-9195
 Project: Apache Hudi
  Issue Type: Sub-task
  Components: flink-sql
Reporter: Shuo Cheng


If there is no delete record, log write handle only writes data blocks, there 
is no need to divide records into upsert records and delete records then, we 
can build data block directly using the record iterator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[I] [SUPPORT] why HoodieDatasetBulkInsertHelper bulkInsert method no BucketBulkInsertDataInternalWriterHelper [hudi]

2025-03-18 Thread via GitHub


leeseven1211 opened a new issue, #12989:
URL: https://github.com/apache/hudi/issues/12989

   use bulk insert ,  only ConsistentBucketBulkInsertDataInternalWriterHelper 
and BulkInsertDataInternalWriterHelper ,why is not support 
BucketBulkInsertDataInternalWriterHelper
   
   Below is the code snippet:
   val writer = writeConfig.getIndexType match {
 case HoodieIndex.IndexType.BUCKET if 
writeConfig.getBucketIndexEngineType
   == BucketIndexEngineType.CONSISTENT_HASHING =>
   new ConsistentBucketBulkInsertDataInternalWriterHelper(
 table,
 writeConfig,
 instantTime,
 taskPartitionId,
 taskId,
 taskEpochId,
 schema,
 writeConfig.populateMetaFields,
 arePartitionRecordsSorted,
 shouldPreserveHoodieMetadata)
   
// Is it possible to add support here?
   
 case _ =>
   new BulkInsertDataInternalWriterHelper(
 table,
 writeConfig,
 instantTime,
 taskPartitionId,
 taskId,
 taskEpochId,
 schema,
 writeConfig.populateMetaFields,
 arePartitionRecordsSorted,
 shouldPreserveHoodieMetadata)
   }
   
   
   
   **Expected behavior**
   
   add:
   case HoodieIndex.IndexType.BUCKET if writeConfig.getBucketIndexEngineType
   == BucketIndexEngineType.SIMPLE=>
   new BucketBulkInsertDataInternalWriterHelper (
  xxx
)
   
   
   
   **Environment Description**
   
   * Hudi version : 0.14.0
   
   * Spark version : 3.1.1
   
   * Hive version : 3.1.1
   
   * Hadoop version : 3.1.1
   
   * Storage (HDFS/S3/GCS..) : hdfs
   
   * Running on Docker? (yes/no) : no
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9170] Fixing schema projection with file group reader [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12970:
URL: https://github.com/apache/hudi/pull/12970#issuecomment-2731183347

   
   ## CI report:
   
   * 09b4ba83b5d61cd777c577e483bfe21098725ecc UNKNOWN
   * b31778b0dd6ccaf619321c6f9b397f7a388c8717 UNKNOWN
   * d8264536a187f6e213ed1eb08d941c0fc86a1e55 UNKNOWN
   * 07b8d68c8ebe12c5e8d29d7964f2d82a1f8f1519 Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4230)
 
   * 27bdb40132d1be6aa68732a2f147be35a1b03945 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4236)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as max precision 13 [hudi]

2025-03-18 Thread via GitHub


imonteroq commented on issue #11335:
URL: https://github.com/apache/hudi/issues/11335#issuecomment-2734199198

   I am also getting the same issue. Running Hudi 0.15 on EMR serverless using 
Spark/Scala. I have saved the incoming data to a new table and it has 
absolutely NO decimal fields with precision 9, it has created them with 
precision 8.
   ```
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.avro.AvroTypeException: Cannot encode decimal with precision 9 as 
max precision 8
   Caused by: org.apache.hudi.exception.HoodieException: 
org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: 
Cannot encode decimal with precision 9 as max precision 8
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9013] Add backwards compatible MDT writer support and reader support with tbl v6 [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on PR #12948:
URL: https://github.com/apache/hudi/pull/12948#issuecomment-2734247170

   CI is failing due to known flaky test ITTestHoodieDataSource. 
testIncrementalReadArchivedCommits 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9083] Fixing flakiness with multi writer test [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12987:
URL: https://github.com/apache/hudi/pull/12987#issuecomment-2734539112

   
   ## CI report:
   
   * 1969b9f2ad75790d1058e6b66ae0995793c3082d Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4240)
 
   * 8f98d0ff87fd8d21365696b22af77caac421cdd5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7037) Column Stats for Decimal Field From Metadata table is read as Bytes

2025-03-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7037:
-
Labels: pull-request-available  (was: )

> Column Stats for Decimal Field From Metadata table is read as Bytes
> ---
>
> Key: HUDI-7037
> URL: https://issues.apache.org/jira/browse/HUDI-7037
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Affects Versions: 0.14.1
>Reporter: Vamshi Gudavarthi
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.2
>
>
> During Onetable project, found that for Decimal field column stats read from 
> metadata table is read as BytesWrapper instead of DecimalWrapper essentially 
> the actual type got lost. Verified write side is fine (i.e. writing as 
> DecimalWrapper) but read side is where the problem is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2734200114

   
   ## CI report:
   
   * 87512b51170f102d612c11b44bea7534a684c51d Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4243)
 
   * 5eb43137bc60a32dee13000bff27b3b08e3694d3 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


yihua commented on code in PR #12991:
URL: https://github.com/apache/hudi/pull/12991#discussion_r2001740210


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java:
##
@@ -199,7 +199,8 @@ private static String[] 
getMandatoryFieldsForMerging(HoodieTableConfig cfg, Type
   }
 }
 
-if (cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) {
+if (cfg.getRecordMergeMode() == null
+|| cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) {

Review Comment:
   A better way is to return `EVENT_TIME_ORDERING` from 
`cfg.getRecordMergeMode()` for table version 6.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


yihua commented on code in PR #12991:
URL: https://github.com/apache/hudi/pull/12991#discussion_r2001740210


##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java:
##
@@ -199,7 +199,8 @@ private static String[] 
getMandatoryFieldsForMerging(HoodieTableConfig cfg, Type
   }
 }
 
-if (cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) {
+if (cfg.getRecordMergeMode() == null
+|| cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) {

Review Comment:
   A better way is to return the inferred merge mode, `EVENT_TIME_ORDERING` or 
`COMMIT_TIME_ORDERING`, from `cfg.getRecordMergeMode()` for table version 6.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2734455487

   
   ## CI report:
   
   * 5eb43137bc60a32dee13000bff27b3b08e3694d3 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4249)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2735187205

   
   ## CI report:
   
   * 6330602b196e68b6fe9f2e1612dec8590dce073c Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4251)
 
   * b5e64be8b802c3d2cb048b11c3c83d1296dc2d41 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4257)
 
   * 411b6b0bd0238e770d9454c88d5a1daca0af41a6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [DISCUSSION] Should we treat `COMMIT_TIME_ORDERING` as a special case of `EVENT_TIME_ORDERING` ? [hudi]

2025-03-18 Thread via GitHub


TheR1sing3un opened a new issue, #12997:
URL: https://github.com/apache/hudi/issues/12997

   From the current code structure, we would treat the merge policy of 
`COMMIT_TIME_ORDERING` as a separate logic, but from a business perspective, 
should we treat it as a special case of `EVENT_TIME_ORDERING` where the 
ORDERING VALUE for each record is the same? For example, right now it's 
represented by `int: 0`. This way we don't need to maintain two merge policies, 
we default to `EVENT_TIME_ORDERING`, the same record_key merge policy is 
handled in the order of `transaction_time` and `event_time`. This will help us 
in the future to maintain the code and deal with the various merge problems we 
have encountered.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [DISCUSSION] Should we treat `COMMIT_TIME_ORDERING` as a special case of `EVENT_TIME_ORDERING` ? [hudi]

2025-03-18 Thread via GitHub


TheR1sing3un commented on issue #12997:
URL: https://github.com/apache/hudi/issues/12997#issuecomment-2735198627

   @yihua @danny0405 I'd love to hear what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2735188604

   
   ## CI report:
   
   * b5e64be8b802c3d2cb048b11c3c83d1296dc2d41 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4257)
 
   * 411b6b0bd0238e770d9454c88d5a1daca0af41a6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2735185144

   
   ## CI report:
   
   * 6330602b196e68b6fe9f2e1612dec8590dce073c Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4251)
 
   * b5e64be8b802c3d2cb048b11c3c83d1296dc2d41 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Fix merge mode inference for table version 6 in file group reader [hudi]

2025-03-18 Thread via GitHub


linliu-code commented on code in PR #12991:
URL: https://github.com/apache/hudi/pull/12991#discussion_r2002315944


##
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderSchemaHandler.java:
##
@@ -0,0 +1,123 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.hudi.common.table.read;
+
+import org.apache.hudi.common.config.RecordMergeMode;
+import org.apache.hudi.common.config.TypedProperties;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.model.HoodieRecordMerger;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.collection.Triple;
+
+import org.apache.avro.Schema;
+import org.junit.jupiter.api.Test;
+import org.junit.jupiter.api.extension.ExtendWith;
+import org.mockito.Mock;
+import org.mockito.junit.jupiter.MockitoExtension;
+
+import static 
org.apache.hudi.common.table.read.HoodieFileGroupReaderSchemaHandler.getMandatoryFieldsForMerging;
+import static org.junit.jupiter.api.Assertions.assertArrayEquals;
+import static org.mockito.Mockito.any;
+import static org.mockito.Mockito.mockStatic;
+import static org.mockito.Mockito.times;
+import static org.mockito.Mockito.verify;
+import static org.mockito.Mockito.when;
+
+@ExtendWith(MockitoExtension.class)
+class TestHoodieFileGroupReaderSchemaHandler {

Review Comment:
   Done. Added a few new test cases there.



##
hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReaderSchemaHandler.java:
##
@@ -199,7 +201,14 @@ private static String[] 
getMandatoryFieldsForMerging(HoodieTableConfig cfg, Type
   }
 }
 
-if (cfg.getRecordMergeMode() == RecordMergeMode.EVENT_TIME_ORDERING) {
+Triple mergingConfigs =

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] When writing to a Hudi MOR table using Flink, data merging did not occur based on the expected value of "precombine.field". [hudi]

2025-03-18 Thread via GitHub


Toroidals closed issue #12996: [SUPPORT] When writing to a Hudi MOR table using 
Flink, data merging did not occur based on the expected value of 
"precombine.field".
URL: https://github.com/apache/hudi/issues/12996


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9197] Fix flaky test for flink: testDynamicPartitionPrune [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12995:
URL: https://github.com/apache/hudi/pull/12995#issuecomment-2735191670

   
   ## CI report:
   
   * a27ffd4b4687f9fb983d5914f2d060d0ce4f6956 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4254)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] docker demo not working: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/format/TypeDefinedOrder [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12946:
URL: https://github.com/apache/hudi/issues/12946#issuecomment-2735210997

   hey @rangareddy : did you try docker demo with 0.15.0 branch. can you report 
it back once you could get it working successfully on your end. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] How to Suppress the HoodieWriterCommitMessage on each Parquet file it Writes HoodieWriterCommitMessage [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12854:
URL: https://github.com/apache/hudi/issues/12854#issuecomment-2735212873

   hey @rangareddy : can you try it out to suppress the logging for the given 
class of interest and report back. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Caused by: org.apache.hudi.exception.HoodieIOException: Exception create input stream from file: HoodieLogFile{pathStr='hdfs://nameservice1/xxx/.00000056-15ec-459f-bb67-5f8c2b319203_2

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12554:
URL: https://github.com/apache/hudi/issues/12554#issuecomment-2735221004

   hey @ad1happy2go @rangareddy : who is following up here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] - Records deleted with via "hard delete" appear after next commit [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12833:
URL: https://github.com/apache/hudi/issues/12833#issuecomment-2735213212

   hey @RuyRoaV : gentle ping. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Does Hudi re-create record level index during an upsert operation? [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12783:
URL: https://github.com/apache/hudi/issues/12783#issuecomment-2735216669

   you should see this only in first batch after you enable RLI. once its fully 
initialized, subsequent batchs should use RLI instead of global simple. 
   
   but the instantiation of RLI itself could be deferrred if there are pending 
instants in the data table. We can gauge that from driver logs. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] docker demo not working: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/parquet/format/TypeDefinedOrder [hudi]

2025-03-18 Thread via GitHub


Souldiv commented on issue #12946:
URL: https://github.com/apache/hudi/issues/12946#issuecomment-2735217288

   hey @rangareddy  I have followed the steps outlined 
[here](https://hudi.apache.org/docs/docker_demo/) I get that error when I try 
to run the sync tool for hive. I believe it might be an issue with the env var 
$HUDI_CLASSPATH not being set. I tried running it on prem as well with 
individual services and it worked when I set that var.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] Log files in Hudi MOR table are not getting deleted [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12702:
URL: https://github.com/apache/hudi/issues/12702#issuecomment-2735217750

   hey @ad1happy2go : whats the status on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Hoodie Custom Merge Paylod results in UnsupportedOperationException [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12571:
URL: https://github.com/apache/hudi/issues/12571#issuecomment-2735220428

   hey folks, whats the status here. did we find the root cause, or not 
reproducible. 
   we are trying to collect issues to be targetted for 1.0.2. So, trying to 
gauge the status of this issue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Queries are very memory intensive due to low read parallelism in HoodieMergeOnReadRDD [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12434:
URL: https://github.com/apache/hudi/issues/12434#issuecomment-2735221825

   any update on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Slow commit times with Spark Structured Streaming from Kinesis to MOR Hudi table [hudi]

2025-03-18 Thread via GitHub


nsivabalan commented on issue #12412:
URL: https://github.com/apache/hudi/issues/12412#issuecomment-2735222177

   hey @ad1happy2go : whats the latest on this 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Enable File Group reader by default for table version 6 [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12935:
URL: https://github.com/apache/hudi/pull/12935#issuecomment-2734044775

   
   ## CI report:
   
   * 5154a0d5f9b8adecd1c675f05405e732b2f1e9fe Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4246)
 
   * 3498922c1b7993f4919f9bb4400fc8a8565ccdac Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4247)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7037) Column Stats for Decimal Field From Metadata table is read as Bytes

2025-03-18 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-7037:
--
Status: Patch Available  (was: In Progress)

> Column Stats for Decimal Field From Metadata table is read as Bytes
> ---
>
> Key: HUDI-7037
> URL: https://issues.apache.org/jira/browse/HUDI-7037
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: metadata
>Affects Versions: 0.14.1
>Reporter: Vamshi Gudavarthi
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.2
>
>
> During Onetable project, found that for Decimal field column stats read from 
> metadata table is read as BytesWrapper instead of DecimalWrapper essentially 
> the actual type got lost. Verified write side is fine (i.e. writing as 
> DecimalWrapper) but read side is where the problem is.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12993:
URL: https://github.com/apache/hudi/pull/12993#issuecomment-2734070157

   
   ## CI report:
   
   * c81a532851ea54fcb58262145d016323a1e42ac7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7037] Fix colstats reading for Decimal field [hudi]

2025-03-18 Thread via GitHub


codope opened a new pull request, #12993:
URL: https://github.com/apache/hudi/pull/12993

   ### Change Logs
   
   When reading column statistics for Decimal fields from the metadata table, 
the unwrapped decimal values are incorrectly handled. Specifically:
   
   - **Type Loss in Unwrapping:**  
 The `tryUnpackValueWrapper` method was matching a `BytesWrapper` before a 
`DecimalWrapper`, causing values written as `DecimalWrapper` (which are 
actually decimals) to be interpreted as raw bytes.
   
   - **Incorrect Deserialization:**  
 In the `deserialize` method, the decimal conversion was not robust enough:
 - It only handled the case where the unwrapped value is a `ByteBuffer`.
 - It did not enforce the correct scale/precision when the value was 
already a decimal (either as a Scala `BigDecimal` or a `java.math.BigDecimal`).
 - Additionally, it was using a private Avro constructor for `Decimal`, 
which prevented proper conversion.
   
   This PR addresses these issues with the following changes:
   
   1. **Reordering in `tryUnpackValueWrapper`:**  
  - The `DecimalWrapper` case is moved before the `BytesWrapper` case. This 
ensures that values written as decimals are unwrapped correctly.
   
   2. **Enhancements to `deserialize`:**  
  - **ByteBuffer Handling:**  
The conversion now uses Avro’s public factory method 
(`org.apache.avro.LogicalTypes.decimal(precision, scale)`) to create a Decimal 
logical type with the correct precision and scale.
  - **Direct Decimal Values:**  
The method now properly handles cases where the unwrapped value is 
already a `scala.math.BigDecimal` or a `java.math.BigDecimal`. In both cases, 
it enforces the target scale using `.setScale(dt.scale, 
java.math.RoundingMode.UNNECESSARY)`.

   3. **Unit Tests:**  
  - New unit tests have been added to cover:
- Decimal values unwrapped as a `ByteBuffer`.
- Decimal values unwrapped as a `java.math.BigDecimal`.
- Decimal values unwrapped as a Scala `BigDecimal`.
  - Additionally, we reuse the existing utility method to generate a 
DataFrame with decimals to ensure that our logic works correctly in an 
integrated scenario.
   
   ### Impact
   
   Decimal values retain their intended semantics when read from the metadata 
table, ensuring that comparisons and filtering, and hence data skipping, based 
on these values work correctly.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2734820536

   
   ## CI report:
   
   * 5eb43137bc60a32dee13000bff27b3b08e3694d3 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4249)
 
   * 6330602b196e68b6fe9f2e1612dec8590dce073c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-9120] Add precombine field if possible [hudi]

2025-03-18 Thread via GitHub


hudi-bot commented on PR #12991:
URL: https://github.com/apache/hudi/pull/12991#issuecomment-2734822641

   
   ## CI report:
   
   * 5eb43137bc60a32dee13000bff27b3b08e3694d3 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4249)
 
   * 6330602b196e68b6fe9f2e1612dec8590dce073c Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=4251)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8655) Create Tests for Filegroup reader for Schema Cache and for Spillable Map

2025-03-18 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-8655:
--
Fix Version/s: 1.0.1
   (was: 1.0.2)

> Create Tests for Filegroup reader for Schema Cache and for Spillable Map
> 
>
> Key: HUDI-8655
> URL: https://issues.apache.org/jira/browse/HUDI-8655
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> We need unit tests for schema cache
> For spillable map, we need to add test cases for how the fg reader will use 
> it and ensure we test spilling to disk in the test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


  1   2   >