Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12558:
URL: https://github.com/apache/hudi/pull/12558#issuecomment-2579389493

   
   ## CI report:
   
   * 4b54a2deb80ccce01ec6560927c0d143a8b5c6ba Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2756)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8841] Fix schema validating exception during flink async cluste… [hudi]

2025-01-09 Thread via GitHub


danny0405 commented on code in PR #12598:
URL: https://github.com/apache/hudi/pull/12598#discussion_r1908311946


##
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##
@@ -605,14 +605,16 @@ public static String createSchemaErrorString(String 
errorMessage, Schema writerS
* @param nullable nullability of column type
* @return a new schema with the nullabilities of the given columns updated
*/
-  public static Schema createSchemaWithNullabilityUpdate(
+  public static Schema forceNullableColumns(
   Schema schema, List nullableUpdateCols, boolean nullable) {

Review Comment:
   nullableUpdateCols -> columns. We can eliminate the flag `nullalbe` because 
it is always true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [WIP] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

2025-01-09 Thread via GitHub


geserdugarov commented on code in PR #12545:
URL: https://github.com/apache/hudi/pull/12545#discussion_r1908303305


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##
@@ -207,11 +207,30 @@ public static DataStream append(
   Configuration conf,
   RowType rowType,
   DataStream dataStream) {
-WriteOperatorFactory operatorFactory = 
AppendWriteOperator.getFactory(conf, rowType);
+boolean isBucketIndex = OptionsResolver.isBucketIndexType(conf);
+if (isBucketIndex) {

Review Comment:
   Should we support use cases like `insert` some data any times, and then 
switch to `upsert` of another bunch of data? Because with my proposed changes 
we will face another problem if we want to switch `upsert` operation here:
   
https://github.com/apache/hudi/blob/e67d0aa71e2253a5b5cf95028cdf95482ffeca6a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java#L181-L184



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]

2025-01-09 Thread via GitHub


TheR1sing3un commented on code in PR #12537:
URL: https://github.com/apache/hudi/pull/12537#discussion_r1908330556


##
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestSparkConsistentBucketClustering.java:
##
@@ -110,7 +115,7 @@ public void setup(int maxFileSize, Map 
options) throws IOExcepti
 
.withStorageConfig(HoodieStorageConfig.newBuilder().parquetMaxFileSize(maxFileSize).build())
 .withClusteringConfig(HoodieClusteringConfig.newBuilder()
 
.withClusteringPlanStrategyClass(SparkConsistentBucketClusteringPlanStrategy.class.getName())
-
.withClusteringExecutionStrategyClass(SparkConsistentBucketClusteringExecutionStrategy.class.getName())
+.withClusteringExecutionStrategyClass(singleJob ? 
SINGLE_SPARK_JOB_CONSISTENT_HASHING_EXECUTION_STRATEGY : 
SPARK_CONSISTENT_BUCKET_EXECUTION_STRATEGY)

Review Comment:
   > Is `SINGLE_SPARK_JOB_CONSISTENT_HASHING_EXECUTION_STRATEGY` always better 
than `SPARK_CONSISTENT_BUCKET_EXECUTION_STRATEGY`, why we need two execution 
strategy.
   
   I'm not sure if any user is already using 
`SPARK_CONSISTENT_BUCKET_EXECUTION_STRATEGY` , and if so, should we keep it for 
compatibility? If not, we can deprecated it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [WIP] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

2025-01-09 Thread via GitHub


geserdugarov commented on code in PR #12545:
URL: https://github.com/apache/hudi/pull/12545#discussion_r1908303305


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java:
##
@@ -207,11 +207,30 @@ public static DataStream append(
   Configuration conf,
   RowType rowType,
   DataStream dataStream) {
-WriteOperatorFactory operatorFactory = 
AppendWriteOperator.getFactory(conf, rowType);
+boolean isBucketIndex = OptionsResolver.isBucketIndexType(conf);
+if (isBucketIndex) {

Review Comment:
   Should we support use cases like `insert` some data any times, and then 
switch to `upsert` of another bunch of data? Because with my proposed changes 
we will face another problem if we want to switch `upsert` operation here:
   
https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java#L181-L184



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]

2025-01-09 Thread via GitHub


TheR1sing3un commented on code in PR #12537:
URL: https://github.com/apache/hudi/pull/12537#discussion_r1908331603


##
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkJobExecutionStrategy.java:
##
@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.client.clustering.run.strategy;
+
+import org.apache.hudi.avro.HoodieAvroUtils;
+import org.apache.hudi.common.engine.HoodieEngineContext;
+import org.apache.hudi.common.model.ClusteringOperation;
+import org.apache.hudi.common.model.HoodieRecord;
+import org.apache.hudi.common.table.HoodieTableConfig;
+import org.apache.hudi.common.table.log.HoodieFileSliceReader;
+import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner;
+import org.apache.hudi.common.util.Option;
+import org.apache.hudi.common.util.StringUtils;
+import org.apache.hudi.common.util.collection.ClosableIterator;
+import org.apache.hudi.common.util.collection.CloseableMappingIterator;
+import org.apache.hudi.common.util.collection.Pair;
+import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.exception.HoodieClusteringException;
+import org.apache.hudi.io.storage.HoodieFileReader;
+import org.apache.hudi.keygen.BaseKeyGenerator;
+import org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory;
+import org.apache.hudi.storage.HoodieStorage;
+import org.apache.hudi.storage.StorageConfiguration;
+import org.apache.hudi.storage.StoragePath;
+import org.apache.hudi.storage.hadoop.HoodieHadoopStorage;
+import org.apache.hudi.table.HoodieTable;
+import 
org.apache.hudi.table.action.cluster.strategy.ClusteringExecutionStrategy;
+
+import org.apache.avro.Schema;
+import org.apache.hadoop.conf.Configuration;
+
+import java.io.IOException;
+
+import static 
org.apache.hudi.client.utils.SparkPartitionUtils.getPartitionFieldVals;
+import static 
org.apache.hudi.io.storage.HoodieSparkIOFactory.getHoodieSparkIOFactory;
+
+public abstract class SparkJobExecutionStrategy extends 
ClusteringExecutionStrategy {

Review Comment:
   > Let's eliminate this class: `SparkJobExecutionStrategy `
   
   Removed~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12537:
URL: https://github.com/apache/hudi/pull/12537#issuecomment-2579417150

   
   ## CI report:
   
   * ef470351aa6e521b57e3f3c5e65aa6b9b77f8634 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2755)
 
   * 16eb3c6e8ed05902b3e1142e3b8a8e58e5b76b42 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12537:
URL: https://github.com/apache/hudi/pull/12537#issuecomment-2579421668

   
   ## CI report:
   
   * ef470351aa6e521b57e3f3c5e65aa6b9b77f8634 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2755)
 
   * 16eb3c6e8ed05902b3e1142e3b8a8e58e5b76b42 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2757)
 
   * 34e41613300f619cf8c2d797f80c43df4ee8ea73 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12558:
URL: https://github.com/apache/hudi/pull/12558#issuecomment-2579441606

   
   ## CI report:
   
   * 4b54a2deb80ccce01ec6560927c0d143a8b5c6ba Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2756)
 
   * bd321ce1851f20cfeb87b5107b8d14fc849f453c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12558:
URL: https://github.com/apache/hudi/pull/12558#issuecomment-2579444837

   
   ## CI report:
   
   * 4b54a2deb80ccce01ec6560927c0d143a8b5c6ba Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2756)
 
   * bd321ce1851f20cfeb87b5107b8d14fc849f453c Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2758)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12558:
URL: https://github.com/apache/hudi/pull/12558#issuecomment-2581900400

   
   ## CI report:
   
   * e1e0ea9214a05cd585989e228639f213cc8f033f Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]

2025-01-09 Thread via GitHub


codope commented on code in PR #12596:
URL: https://github.com/apache/hudi/pull/12596#discussion_r1909901588


##
azure-pipelines-20230430.yml:
##
@@ -214,7 +214,7 @@ stages:
 displayName: Top 100 long-running testcases
   - job: UT_FT_3
 displayName: UT spark-datasource Java Tests & DDL
-timeoutInMinutes: '90'
+timeoutInMinutes: '120'

Review Comment:
   let's make sure that colstats support for test table, and lowering the test 
timeout is tracked somewhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8602] Fix a bug for incremental query [hudi]

2025-01-09 Thread via GitHub


linliu-code commented on code in PR #12385:
URL: https://github.com/apache/hudi/pull/12385#discussion_r1909901739


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala:
##
@@ -209,7 +209,14 @@ trait HoodieIncrementalRelationTrait extends 
HoodieBaseRelation {
 
   protected lazy val includedCommits: immutable.Seq[HoodieInstant] = 
queryContext.getInstants.asScala.toList
 
-  protected lazy val commitsMetadata = 
includedCommits.map(getCommitMetadata(_, super.timeline)).asJava
+  protected lazy val commitsMetadata = includedCommits.map(
+i => {

Review Comment:
   @danny0405 , I have checked, and the reader did not fall back to snapshot 
query because of the configuration is false by default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8824] MIT should error out for some assignment clause patterns [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12584:
URL: https://github.com/apache/hudi/pull/12584#issuecomment-2581218611

   
   ## CI report:
   
   * 631494d4f6e8389bf8c7a7d90a360fc1ea2d159d Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2770)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-8851) MOR delete query hits NPE when fetching ordering value

2025-01-09 Thread Davis Zhang (Jira)
Davis Zhang created HUDI-8851:
-

 Summary: MOR delete query hits NPE when fetching ordering value
 Key: HUDI-8851
 URL: https://issues.apache.org/jira/browse/HUDI-8851
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Davis Zhang


https://github.com/apache/hudi/pull/12610



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-8762) Fix issues around incremental query

2025-01-09 Thread Y Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo reassigned HUDI-8762:
-

Assignee: Lin Liu  (was: Y Ethan Guo)

> Fix issues around incremental query
> ---
>
> Key: HUDI-8762
> URL: https://issues.apache.org/jira/browse/HUDI-8762
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Lin Liu
>Priority: Blocker
> Fix For: 1.0.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[I] Upgrade pyo3, arrow-rs, datafusion [hudi-rs]

2025-01-09 Thread via GitHub


xushiyan opened a new issue, #242:
URL: https://github.com/apache/hudi-rs/issues/242

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12610:
URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581324422

   
   ## CI report:
   
   * 4c14c955871ea88e3ff6ccfab667fe434a16a833 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2772)
 
   * 6142abfcebbf84d3bf32097c7499b60ff11ae0a1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12610:
URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581326888

   
   ## CI report:
   
   * 4c14c955871ea88e3ff6ccfab667fe434a16a833 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2772)
 
   * 6142abfcebbf84d3bf32097c7499b60ff11ae0a1 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2774)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8553) Spark SQL UPDATE and DELETE should write record positions

2025-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-8553:
-
Labels: pull-request-available  (was: )

> Spark SQL UPDATE and DELETE should write record positions
> -
>
> Key: HUDI-8553
> URL: https://issues.apache.org/jira/browse/HUDI-8553
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Y Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>   Original Estimate: 6h
>  Time Spent: 5h
>  Remaining Estimate: 8h
>
> Though there is no read and write error, Spark SQL UPDATE and DELETE do not 
> write record positions to the log files.
> {code:java}
> spark-sql (default)> CREATE TABLE testing_positions.table2 (
>                    >     ts BIGINT,
>                    >     uuid STRING,
>                    >     rider STRING,
>                    >     driver STRING,
>                    >     fare DOUBLE,
>                    >     city STRING
>                    > ) USING HUDI
>                    > LOCATION 
> 'file:///Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2'
>                    > TBLPROPERTIES (
>                    >   type = 'mor',
>                    >   primaryKey = 'uuid',
>                    >   preCombineField = 'ts'
>                    > )
>                    > PARTITIONED BY (city);
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> Time taken: 0.4 seconds
> spark-sql (default)> INSERT INTO testing_positions.table2
>                    > VALUES
>                    > 
> (1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
>                    > 
> (1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
>  ,'san_francisco'),
>                    > 
> (1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
>  ,'san_francisco'),
>                    > 
> (1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
>                    > 
> (1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'
>     ),
>                    > 
> (1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
>  ,'sao_paulo'    ),
>                    > 
> (1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
>  ,'chennai'      ),
>                    > 
> (169511511,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updated size to 436166
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updated size to 436185
> 24/11/16 12:03:29 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updated size to 436166
> 24/11/16 12:03:30 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updated size to 436185
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436166
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436185
> 24/11/16 12:03:30 WARN HiveConf: HiveConf of name 
> hive.internal.ss.authz.settings.applied.marker does not exist
> 24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout 
> does not exist
> 24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.retries.wait 
> does not exist
> Time taken: 4.843 seconds
> spark-sql (default)> 
>                    > SET hoodie.merge.small.file.group.candidates.limit = 0;
> hoodie.merge.small.file.group.candidates.limit    0
> Time taken: 0.018 seconds

[PR] [HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements [hudi]

2025-01-09 Thread via GitHub


yihua opened a new pull request, #12612:
URL: https://github.com/apache/hudi/pull/12612

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12612:
URL: https://github.com/apache/hudi/pull/12612#issuecomment-2581581182

   
   ## CI report:
   
   * 099eea2fba303c305950fad54010c503aff5c41e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12612:
URL: https://github.com/apache/hudi/pull/12612#issuecomment-2581582716

   
   ## CI report:
   
   * 099eea2fba303c305950fad54010c503aff5c41e Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2784)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-8553) Spark SQL UPDATE and DELETE should write record positions

2025-01-09 Thread Y Ethan Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911738#comment-17911738
 ] 

Y Ethan Guo commented on HUDI-8553:
---

I have a draft PR up which makes the prepped upsert flow write record positions 
to the log blocks from Spark SQL UPDATE statement.  I'm going to fix a few 
issues before opening it up for review.

> Spark SQL UPDATE and DELETE should write record positions
> -
>
> Key: HUDI-8553
> URL: https://issues.apache.org/jira/browse/HUDI-8553
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Y Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>   Original Estimate: 6h
>  Time Spent: 5h
>  Remaining Estimate: 8h
>
> Though there is no read and write error, Spark SQL UPDATE and DELETE do not 
> write record positions to the log files.
> {code:java}
> spark-sql (default)> CREATE TABLE testing_positions.table2 (
>                    >     ts BIGINT,
>                    >     uuid STRING,
>                    >     rider STRING,
>                    >     driver STRING,
>                    >     fare DOUBLE,
>                    >     city STRING
>                    > ) USING HUDI
>                    > LOCATION 
> 'file:///Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2'
>                    > TBLPROPERTIES (
>                    >   type = 'mor',
>                    >   primaryKey = 'uuid',
>                    >   preCombineField = 'ts'
>                    > )
>                    > PARTITIONED BY (city);
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> Time taken: 0.4 seconds
> spark-sql (default)> INSERT INTO testing_positions.table2
>                    > VALUES
>                    > 
> (1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
>                    > 
> (1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
>  ,'san_francisco'),
>                    > 
> (1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
>  ,'san_francisco'),
>                    > 
> (1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
>                    > 
> (1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'
>     ),
>                    > 
> (1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
>  ,'sao_paulo'    ),
>                    > 
> (1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
>  ,'chennai'      ),
>                    > 
> (169511511,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updated size to 436166
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updated size to 436185
> 24/11/16 12:03:29 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updated size to 436166
> 24/11/16 12:03:30 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updated size to 436185
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436166
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436185
> 24/11/16 12:03:30 WARN HiveConf: HiveConf of name 
> hive.internal.ss.authz.settings.applied.marker does not exist
> 24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout 
> does not exist
> 24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.retries.wait 
> does not exist
> Time 

[PR] [HUDI-8624] Avoid check metadata for archived commits in incremental queries [hudi]

2025-01-09 Thread via GitHub


linliu-code opened a new pull request, #12613:
URL: https://github.com/apache/hudi/pull/12613

   ### Change Logs
   
   When start commit is archived, we fall back to full scan.
   
   ### Impact
   
   Avoid expensive metadata fetching for archived instants.
   
   ### Risk level (write none, low medium or high below)
   
   Medium.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation

2025-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-8624:
-
Labels: pull-request-available  (was: )

> Revisit commitsMetadata fetching from timeline history in 
> MergeOnReadIncrementalRelation
> 
>
> Key: HUDI-8624
> URL: https://issues.apache.org/jira/browse/HUDI-8624
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Lin Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>
> [https://github.com/apache/hudi/pull/12385/files#r1865249449]
> We need to revisit why we need commit metadata from timeline history.  
> Reading timeline history (archival timeline in old term) is expensive and 
> should not be incurred in incremental query except for completion time lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8624] Avoid check metadata for archived commits in incremental queries [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12613:
URL: https://github.com/apache/hudi/pull/12613#issuecomment-2581592281

   
   ## CI report:
   
   * 39ca7fae423367a6f48c5139b257176d22beac02 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation

2025-01-09 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-8624:
--
Status: Patch Available  (was: In Progress)

> Revisit commitsMetadata fetching from timeline history in 
> MergeOnReadIncrementalRelation
> 
>
> Key: HUDI-8624
> URL: https://issues.apache.org/jira/browse/HUDI-8624
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Lin Liu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>
> [https://github.com/apache/hudi/pull/12385/files#r1865249449]
> We need to revisit why we need commit metadata from timeline history.  
> Reading timeline history (archival timeline in old term) is expensive and 
> should not be incurred in incremental query except for completion time lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8635) Revisit stats generated in HoodieSparkFileGroupReaderBasedMergeHandle

2025-01-09 Thread Y Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-8635:
--
Status: In Progress  (was: Open)

> Revisit stats generated in HoodieSparkFileGroupReaderBasedMergeHandle
> -
>
> Key: HUDI-8635
> URL: https://issues.apache.org/jira/browse/HUDI-8635
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Y Ethan Guo
>Priority: Blocker
> Fix For: 1.0.1
>
>
> We need to make sure the write stats generated by the new file group 
> reader-based merge handle for compaction (
> HoodieSparkFileGroupReaderBasedMergeHandle) are intact in all cases.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8762) Fix issues around incremental query

2025-01-09 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-8762:
--
Status: Patch Available  (was: In Progress)

> Fix issues around incremental query
> ---
>
> Key: HUDI-8762
> URL: https://issues.apache.org/jira/browse/HUDI-8762
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Lin Liu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12558:
URL: https://github.com/apache/hudi/pull/12558#issuecomment-2581813248

   
   ## CI report:
   
   * c5912b6788b23621a4dcc609a4d5b4e6ae0af6da Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2767)
 
   * e1e0ea9214a05cd585989e228639f213cc8f033f Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2792)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12558:
URL: https://github.com/apache/hudi/pull/12558#issuecomment-2581811657

   
   ## CI report:
   
   * c5912b6788b23621a4dcc609a4d5b4e6ae0af6da Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2767)
 
   * e1e0ea9214a05cd585989e228639f213cc8f033f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12596:
URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581822443

   
   ## CI report:
   
   * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN
   * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN
   * 15866ae0099c3b58d22329be0e5008b3149cb95f Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8796) Silent ignoring of bucket index in Flink append mode

2025-01-09 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-8796:

Description: Currently, there is no exception when we try to write data in 
Flink append mode using bucket index. Data will be written, but in parquet 
files without bucket IDs.  (was: Currently, there is no exception when we try 
to write data in Flink append mode (COW, insert) using bucket index. Data will 
be written, but in parquet files without bucket IDs.)

> Silent ignoring of bucket index in Flink append mode
> 
>
> Key: HUDI-8796
> URL: https://issues.apache.org/jira/browse/HUDI-8796
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>
> Currently, there is no exception when we try to write data in Flink append 
> mode using bucket index. Data will be written, but in parquet files without 
> bucket IDs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8796) Silent ignoring of bucket index in Flink append mode

2025-01-09 Thread Geser Dugarov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geser Dugarov updated HUDI-8796:

Summary: Silent ignoring of bucket index in Flink append mode  (was: Silent 
ignoring of simple bucket index in Flink append mode)

> Silent ignoring of bucket index in Flink append mode
> 
>
> Key: HUDI-8796
> URL: https://issues.apache.org/jira/browse/HUDI-8796
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Geser Dugarov
>Assignee: Geser Dugarov
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>
> Currently, there is no exception when we try to write data in Flink append 
> mode (COW, insert) using bucket index. Data will be written, but in parquet 
> files without bucket IDs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated (5f591eec223 -> dc001ea4828)

2025-01-09 Thread codope
This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 5f591eec223 [HUDI-8762] Fix a typo in Fix a typo in 
TestIncrementalQueryWithArchivedInstants (#12611)
 add dc001ea4828 [HUDI-8775] Expression index on a column should get 
tracked at partition level if partition stats index is turned on (#12558)

No new revisions were added by this update.

Summary of changes:
 .../metadata/HoodieBackedTableMetadataWriter.java  |  39 +-
 .../client/utils/SparkMetadataWriterUtils.java | 278 +++--
 .../org/apache/hudi/data/HoodieJavaPairRDD.java|  10 +
 .../expression/HoodieSparkExpressionIndex.java |  27 +
 .../SparkHoodieBackedTableMetadataWriter.java  | 118 ++--
 .../hudi/common/data/HoodieListPairData.java   |   8 +
 .../apache/hudi/common/data/HoodiePairData.java|  11 +
 .../common/model/HoodieColumnRangeMetadata.java|   4 +
 .../index/expression/HoodieExpressionIndex.java|   2 +
 .../hudi/metadata/HoodieMetadataPayload.java   |  16 +-
 .../hudi/metadata/HoodieTableMetadataUtil.java | 165 --
 .../hudi/metadata/TestHoodieMetadataPayload.java   |  12 +-
 .../scala/org/apache/hudi/BucketIndexSupport.scala |   2 +-
 .../org/apache/hudi/ExpressionIndexSupport.scala   |  67 ++-
 .../scala/org/apache/hudi/HoodieFileIndex.scala|  20 +-
 .../apache/hudi/PartitionStatsIndexSupport.scala   |  44 +-
 .../hudi/command/index/TestExpressionIndex.scala   | 626 -
 .../utilities/HoodieMetadataTableValidator.java|   2 +-
 18 files changed, 1241 insertions(+), 210 deletions(-)



Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]

2025-01-09 Thread via GitHub


codope merged PR #12558:
URL: https://github.com/apache/hudi/pull/12558


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-8775) Expression index on a column should get tracked at partition level if partition stats index is turned on

2025-01-09 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-8775.
-
Resolution: Fixed

> Expression index on a column should get tracked at partition level if 
> partition stats index is turned on
> 
>
> Key: HUDI-8775
> URL: https://issues.apache.org/jira/browse/HUDI-8775
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Assignee: Lokesh Jain
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>
> Use case: I have partition stats index enabled, and then I create an 
> expression index using col stats on a {{ts}} column, mapping it to a date. In 
> this case, the stats based on derived value from expression should be tracked 
> at partition level too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12596:
URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581888527

   
   ## CI report:
   
   * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN
   * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN
   * 15866ae0099c3b58d22329be0e5008b3149cb95f Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2791)
 
   * 2710a96832046a764b7125c1152c788d96c6e1f9 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2793)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8839) [Ethan pls check worklog] CDC query: The beforeImageRecords and afterImageRecords are both in-memory hash map, they should be changes to spillable map.

2025-01-09 Thread Davis Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang updated HUDI-8839:
--
Summary: [Ethan pls check worklog] CDC query: The beforeImageRecords and 
afterImageRecords are both in-memory hash map, they should be changes to 
spillable map.  (was: CDC query: The beforeImageRecords and afterImageRecords 
are both in-memory hash map, they should be changes to spillable map.)

> [Ethan pls check worklog] CDC query: The beforeImageRecords and 
> afterImageRecords are both in-memory hash map, they should be changes to 
> spillable map.
> ---
>
> Key: HUDI-8839
> URL: https://issues.apache.org/jira/browse/HUDI-8839
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Davis Zhang
>Assignee: Davis Zhang
>Priority: Major
> Fix For: 1.0.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
>  
> [https://github.com/apache/hudi/pull/12592]
>  
> acceptance criteria local testing oom no longer ooms



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8839) [Ethan pls check worklog] CDC query: The beforeImageRecords and afterImageRecords are both in-memory hash map, they should be changes to spillable map.

2025-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-8839:
-
Labels: pull-request-available  (was: )

> [Ethan pls check worklog] CDC query: The beforeImageRecords and 
> afterImageRecords are both in-memory hash map, they should be changes to 
> spillable map.
> ---
>
> Key: HUDI-8839
> URL: https://issues.apache.org/jira/browse/HUDI-8839
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Davis Zhang
>Assignee: Davis Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
>  
> [https://github.com/apache/hudi/pull/12592]
>  
> acceptance criteria local testing oom no longer ooms



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8839] CdcFileGroupIterator use spillable hashmap [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12592:
URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581471756

   
   ## CI report:
   
   * 28247026a78dda613a41ed2f039cbf11bb7d5d95 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2779)
 
   * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN
   * a5544e3e3d5aa734348b7bfd63820d5b8d98cc33 UNKNOWN
   * 7ca86e570e17a7db2c7394d62f9d95bda8f439db UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation

2025-01-09 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu reassigned HUDI-8624:
-

Assignee: Lin Liu

> Revisit commitsMetadata fetching from timeline history in 
> MergeOnReadIncrementalRelation
> 
>
> Key: HUDI-8624
> URL: https://issues.apache.org/jira/browse/HUDI-8624
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Lin Liu
>Priority: Critical
> Fix For: 1.0.1
>
>
> [https://github.com/apache/hudi/pull/12385/files#r1865249449]
> We need to revisit why we need commit metadata from timeline history.  
> Reading timeline history (archival timeline in old term) is expensive and 
> should not be incurred in incremental query except for completion time lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-8172) Make primaryKey and other column configs case insensitive

2025-01-09 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-8172:
---

Assignee: Vova Kolmakov

> Make primaryKey and other column configs case insensitive
> -
>
> Key: HUDI-8172
> URL: https://issues.apache.org/jira/browse/HUDI-8172
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Aditya Goenka
>Assignee: Vova Kolmakov
>Priority: Critical
> Fix For: 1.1.0
>
>
> The primaryKey and other configs should be case insensitive.
>  
> Test case to reproduce - 
>   test("Test primary key case sensitive") {
> withTempDir { tmp =>val tableName = generateTableName  // Create a 
> partitioned table  spark.sql(s"""   |create table 
> $tableName (   |  id int,   |  name string,   |  
> price double,   |  ts long,   |  dt string   |) using 
> hudi   | tblproperties (primaryKey = 'ID'   | )   | 
> partitioned by (dt)   | location 
> '${tmp.getCanonicalPath}'""".stripMargin)
>   spark.sql(s"""   | insert into $tableName   | 
> select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-01-05' as 
> dt""".stripMargin)
>   checkAnswer(s"select id, name, price, ts, dt from $tableName")(
> Seq(1, "a1", 10.0, 1000 , "2021-01-05")
>   )
> }
>   }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8172) Make primaryKey and other column configs case insensitive

2025-01-09 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov updated HUDI-8172:

Status: In Progress  (was: Open)

> Make primaryKey and other column configs case insensitive
> -
>
> Key: HUDI-8172
> URL: https://issues.apache.org/jira/browse/HUDI-8172
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: spark-sql
>Reporter: Aditya Goenka
>Assignee: Vova Kolmakov
>Priority: Critical
> Fix For: 1.1.0
>
>
> The primaryKey and other configs should be case insensitive.
>  
> Test case to reproduce - 
>   test("Test primary key case sensitive") {
> withTempDir { tmp =>val tableName = generateTableName  // Create a 
> partitioned table  spark.sql(s"""   |create table 
> $tableName (   |  id int,   |  name string,   |  
> price double,   |  ts long,   |  dt string   |) using 
> hudi   | tblproperties (primaryKey = 'ID'   | )   | 
> partitioned by (dt)   | location 
> '${tmp.getCanonicalPath}'""".stripMargin)
>   spark.sql(s"""   | insert into $tableName   | 
> select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-01-05' as 
> dt""".stripMargin)
>   checkAnswer(s"select id, name, price, ts, dt from $tableName")(
> Seq(1, "a1", 10.0, 1000 , "2021-01-05")
>   )
> }
>   }



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12545:
URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581794386

   
   ## CI report:
   
   * 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12537:
URL: https://github.com/apache/hudi/pull/12537#issuecomment-2581802070

   
   ## CI report:
   
   * 64ad84f40ff6a47df76979a382525fee0cc67d2e Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2790)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]

2025-01-09 Thread via GitHub


codope commented on code in PR #12558:
URL: https://github.com/apache/hudi/pull/12558#discussion_r1909890911


##
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##
@@ -395,7 +392,9 @@ public static Map> 
convertMetadataToRecords(Hoo
 if (enabledPartitionTypes.contains(MetadataPartitionType.PARTITION_STATS)) 
{
   
checkState(MetadataPartitionType.COLUMN_STATS.isMetadataPartitionAvailable(dataMetaClient),
   "Column stats partition must be enabled to generate partition stats. 
Please enable: " + 
HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key());
-  final HoodieData partitionStatsRDD = 
convertMetadataToPartitionStatsRecords(commitMetadata, context, dataMetaClient, 
metadataConfig);
+  // Generate Hoodie Pair data of partition name and list of column range 
metadata for all the files in that partition

Review Comment:
   nit: also fix the comment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-8837) Fix reading partition path field on metadata bootstrap table

2025-01-09 Thread Y Ethan Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911710#comment-17911710
 ] 

Y Ethan Guo commented on HUDI-8837:
---

The test is added in https://github.com/apache/hudi/pull/12490. Right now the 
validation excludes partition column.  When adding that in the validation, the 
validation fails.

 
{code:java}
def assertDfEquals(df1: DataFrame, df2: DataFrame): Unit = {
    assertEquals(df1.count, df2.count)
    // TODO(HUDI-8723): fix reading partition path field on metadata bootstrap 
table
    assertEquals(0, 
df1.drop(partitionColName).except(df2.drop(partitionColName)).count)
    assertEquals(0, 
df2.drop(partitionColName).except(df1.drop(partitionColName)).count)
  } {code}
 

 

> Fix reading partition path field on metadata bootstrap table
> 
>
> Key: HUDI-8837
> URL: https://issues.apache.org/jira/browse/HUDI-8837
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Davis Zhang
>Priority: Blocker
> Fix For: 1.0.1
>
>
> When adding strict data validation within 
> testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals 
> that the partition path field reading fails (returns null) for some update 
> records. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-8553) Spark SQL UPDATE and DELETE should write record positions

2025-01-09 Thread Y Ethan Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911713#comment-17911713
 ] 

Y Ethan Guo commented on HUDI-8553:
---

In the UPDATE and DELETE command, we'll try creating the relation with a schema 
that has the row index meta column or a new hoodie meta column to attach the 
row index column to the return DF (this also requires the file group reader and 
parquet reader to keep the new row index column by fixing the wiring).  In that 
way, we can pass the positions down to the prepped write flow and prepare the 
HoodieRecords with the current record location.

> Spark SQL UPDATE and DELETE should write record positions
> -
>
> Key: HUDI-8553
> URL: https://issues.apache.org/jira/browse/HUDI-8553
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Y Ethan Guo
>Priority: Blocker
> Fix For: 1.0.1
>
>   Original Estimate: 6h
>  Time Spent: 5h
>  Remaining Estimate: 8h
>
> Though there is no read and write error, Spark SQL UPDATE and DELETE do not 
> write record positions to the log files.
> {code:java}
> spark-sql (default)> CREATE TABLE testing_positions.table2 (
>                    >     ts BIGINT,
>                    >     uuid STRING,
>                    >     rider STRING,
>                    >     driver STRING,
>                    >     fare DOUBLE,
>                    >     city STRING
>                    > ) USING HUDI
>                    > LOCATION 
> 'file:///Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2'
>                    > TBLPROPERTIES (
>                    >   type = 'mor',
>                    >   primaryKey = 'uuid',
>                    >   preCombineField = 'ts'
>                    > )
>                    > PARTITIONED BY (city);
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> Time taken: 0.4 seconds
> spark-sql (default)> INSERT INTO testing_positions.table2
>                    > VALUES
>                    > 
> (1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'),
>                    > 
> (1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70
>  ,'san_francisco'),
>                    > 
> (1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90
>  ,'san_francisco'),
>                    > 
> (1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'),
>                    > 
> (1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo'
>     ),
>                    > 
> (1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40
>  ,'sao_paulo'    ),
>                    > 
> (1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06
>  ,'chennai'      ),
>                    > 
> (169511511,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai');
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file 
> written for commit, so could not get schema for table 
> file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updated size to 436166
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro
> 24/11/16 12:03:29 WARN log: Updated size to 436185
> 24/11/16 12:03:29 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updated size to 436166
> 24/11/16 12:03:30 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt
> 24/11/16 12:03:30 WARN log: Updated size to 436185
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436166
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436386
> 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2
> 24/11/16 12:03:30 WARN log: Updated size to 436185
> 24/11/16 12:03:30 WARN HiveConf: HiveConf of name 
> hive.internal.ss.authz.settings.applied.marker does 

[jira] [Updated] (HUDI-8762) Fix issues around incremental query

2025-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-8762:
-
Labels: pull-request-available  (was: )

> Fix issues around incremental query
> ---
>
> Key: HUDI-8762
> URL: https://issues.apache.org/jira/browse/HUDI-8762
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Lin Liu
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-8762] Fix a typo in a test [hudi]

2025-01-09 Thread via GitHub


linliu-code opened a new pull request, #12611:
URL: https://github.com/apache/hudi/pull/12611

   ### Change Logs
   
   The config was not set correctly.
   
   ### Impact
   
   Fixed a typo.
   
   ### Risk level (write none, low medium or high below)
   
   None.
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8762] Fix a typo in a test [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12611:
URL: https://github.com/apache/hudi/pull/12611#issuecomment-2581543130

   
   ## CI report:
   
   * 441dfd77c5036cfac3ce7a84cd7984408f5b6b64 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8762] Fix a typo in a test [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12611:
URL: https://github.com/apache/hudi/pull/12611#issuecomment-2581544722

   
   ## CI report:
   
   * 441dfd77c5036cfac3ce7a84cd7984408f5b6b64 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2783)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8839] CdcFileGroupIterator use spillable hashmap [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12592:
URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581548015

   
   ## CI report:
   
   * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN
   * a5544e3e3d5aa734348b7bfd63820d5b8d98cc33 UNKNOWN
   * 7ca86e570e17a7db2c7394d62f9d95bda8f439db Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2782)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8824] MIT should error out for some assignment clause patterns [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12584:
URL: https://github.com/apache/hudi/pull/12584#issuecomment-2581546292

   
   ## CI report:
   
   * ffad81180c72f871a9677549e38f1915e5668adb Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12545:
URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581691059

   
   ## CI report:
   
   * 20a6a8c042d092026fbed250e5b313e366d2cf61 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2786)
 
   * 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2789)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-8854) Support LocalDate with ordering value in DeleteRecord

2025-01-09 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-8854:
-

Assignee: sivabalan narayanan

> Support LocalDate with ordering value in DeleteRecord
> -
>
> Key: HUDI-8854
> URL: https://issues.apache.org/jira/browse/HUDI-8854
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 1.0.2
>
>
> We are removing LocalDate support for ordering value in this patch 
> [https://github.com/apache/hudi/pull/12596] 
>  
> We wanted to add it back. Filing this tracking ticket. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-8854) Support LocalDate with ordering value in DeleteRecord

2025-01-09 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-8854:
-

 Summary: Support LocalDate with ordering value in DeleteRecord
 Key: HUDI-8854
 URL: https://issues.apache.org/jira/browse/HUDI-8854
 Project: Apache Hudi
  Issue Type: Improvement
  Components: writer-core
Reporter: sivabalan narayanan


We are removing LocalDate support for ordering value in this patch 
[https://github.com/apache/hudi/pull/12596] 

 

We wanted to add it back. Filing this tracking ticket. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8854) Support LocalDate with ordering value in DeleteRecord

2025-01-09 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-8854:
--
Fix Version/s: 1.0.2

> Support LocalDate with ordering value in DeleteRecord
> -
>
> Key: HUDI-8854
> URL: https://issues.apache.org/jira/browse/HUDI-8854
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: writer-core
>Reporter: sivabalan narayanan
>Priority: Major
> Fix For: 1.0.2
>
>
> We are removing LocalDate support for ordering value in this patch 
> [https://github.com/apache/hudi/pull/12596] 
>  
> We wanted to add it back. Filing this tracking ticket. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]

2025-01-09 Thread via GitHub


TheR1sing3un commented on code in PR #12537:
URL: https://github.com/apache/hudi/pull/12537#discussion_r1909786316


##
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/ClusteringExecutionStrategy.java:
##
@@ -67,4 +85,69 @@ protected HoodieEngineContext getEngineContext() {
   protected HoodieWriteConfig getWriteConfig() {
 return this.writeConfig;
   }
+
+  protected ClosableIterator> 
getRecordIteratorWithLogFiles(ClusteringOperation operation, String 
instantTime, long maxMemory) {
+HoodieWriteConfig config = getWriteConfig();
+HoodieTable table = getHoodieTable();
+StorageConfiguration storageConf = table.getStorageConf();
+HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig();
+String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null);
+Option partitionFields = tableConfig.getPartitionFields();
+HoodieMergedLogRecordScanner scanner = 
HoodieMergedLogRecordScanner.newBuilder()
+.withStorage(table.getStorage())
+.withBasePath(table.getMetaClient().getBasePath())
+.withLogFilePaths(operation.getDeltaFilePaths())
+.withReaderSchema(readerSchemaWithMetaFields)
+.withLatestInstantTime(instantTime)
+.withMaxMemorySizeInBytes(maxMemory)
+.withReverseReader(config.getCompactionReverseLogReadEnabled())
+.withBufferSize(config.getMaxDFSStreamBufferSize())
+.withSpillableMapBasePath(config.getSpillableMapBasePath())
+.withPartition(operation.getPartitionPath())
+.withOptimizedLogBlocksScan(config.enableOptimizedLogBlocksScan())
+.withDiskMapType(config.getCommonConfig().getSpillableDiskMapType())
+
.withBitCaskDiskMapCompressionEnabled(config.getCommonConfig().isBitCaskDiskMapCompressionEnabled())
+.withRecordMerger(config.getRecordMerger())
+.withTableMetaClient(table.getMetaClient())
+.build();
+
+Option baseFileReader = 
StringUtils.isNullOrEmpty(operation.getDataFilePath())
+? Option.empty()
+: Option.of(getBaseOrBootstrapFileReader(storageConf, 
bootstrapBasePath, partitionFields, operation));
+Option keyGeneratorOp = getKeyGenerator();
+try {
+  return new HoodieFileSliceReader(baseFileReader, scanner, 
readerSchemaWithMetaFields, tableConfig.getPreCombineField(), 
config.getRecordMerger(),
+  tableConfig.getProps(),
+  tableConfig.populateMetaFields() ? Option.empty() : 
Option.of(Pair.of(tableConfig.getRecordKeyFieldProp(),
+  tableConfig.getPartitionFieldProp())), keyGeneratorOp);
+} catch (IOException e) {
+  throw new HoodieClusteringException("Error reading file slices", e);
+}
+  }
+
+  protected ClosableIterator> 
getRecordIteratorWithBaseFileOnly(ClusteringOperation operation) {
+StorageConfiguration storageConf = getHoodieTable().getStorageConf();
+HoodieTableConfig tableConfig = 
getHoodieTable().getMetaClient().getTableConfig();
+String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null);
+Option partitionFields = tableConfig.getPartitionFields();
+HoodieFileReader baseFileReader = 
getBaseOrBootstrapFileReader(storageConf, bootstrapBasePath, partitionFields, 
operation);
+
+Option keyGeneratorOp = getKeyGenerator();
+// NOTE: Record have to be cloned here to make sure if it holds low-level 
engine-specific
+//   payload pointing into a shared, mutable (underlying) buffer we 
get a clean copy of
+//   it since these records will be shuffled later.
+ClosableIterator baseRecordsIterator;
+try {
+  baseRecordsIterator = 
baseFileReader.getRecordIterator(readerSchemaWithMetaFields);
+} catch (IOException e) {
+  throw new HoodieClusteringException("Error reading base file", e);
+}
+return new CloseableMappingIterator(
+baseRecordsIterator,
+rec -> ((HoodieRecord) 
rec).copy().wrapIntoHoodieRecordPayloadWithKeyGen(readerSchemaWithMetaFields, 
writeConfig.getProps(), keyGeneratorOp));
+  }
+
+  protected abstract Option getKeyGenerator();

Review Comment:
   > Let's remove the interface for `getKeyGenerator` and 
`getBaseOrBootstrapFileReader`, they are just utilities.
   
   done~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12596:
URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581698677

   
   ## CI report:
   
   * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN
   * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN
   * a0efb5a7f12042228a5444aeab00f98827dfad3a Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2777)
 
   * 15866ae0099c3b58d22329be0e5008b3149cb95f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12537:
URL: https://github.com/apache/hudi/pull/12537#issuecomment-2581698449

   
   ## CI report:
   
   * 6198247de5d01f8edaf4976efffdffa6e6674b64 Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2763)
 
   * 64ad84f40ff6a47df76979a382525fee0cc67d2e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12537:
URL: https://github.com/apache/hudi/pull/12537#issuecomment-2581699957

   
   ## CI report:
   
   * 6198247de5d01f8edaf4976efffdffa6e6674b64 Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2763)
 
   * 64ad84f40ff6a47df76979a382525fee0cc67d2e Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2790)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8851) MOR delete query hits NPE when fetching ordering value

2025-01-09 Thread Davis Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang updated HUDI-8851:
--
Description: 
[https://github.com/apache/hudi/pull/12610]

when running the delete statement of the test, we got 

 
Job aborted due to stage failure: Task 0 in stage 440.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 440.0 (TID 610) (daviss-mbp.attlocal.net 
executor driver): org.apache.hudi.exception.HoodieUpsertException: Error 
upserting bucketType UPDATE for partition :0
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:319)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:252)
at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:908)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:908)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:380)
at 
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548)
at 
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522)
at 
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.sql.HoodieUnsafeRowUtils$.getNestedInternalRowValue(HoodieUnsafeRowUtils.scala:69)
at 
org.apache.spark.sql.HoodieUnsafeRowUtils.getNestedInternalRowValue(HoodieUnsafeRowUtils.scala)
at 
org.apache.hudi.common.model.HoodieSparkRecord.getOrderingValue(HoodieSparkRecord.java:322)
at 
org.apache.hudi.io.HoodieAppendHandle.writeToBuffer(HoodieAppendHandle.java:608)
at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:465)
at 
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:312)
... 29 more
 
Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 440.0 failed 1 times, most recent failure: Lost task 0.0 in stage 440.0 
(TID 610) (daviss-mbp.attlocal.net executor driver): 
org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType 
UPDATE for partition :0
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:319)
at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:252)
at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at 
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:908)
at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:908)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache

[jira] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation

2025-01-09 Thread Lin Liu (Jira)


[ https://issues.apache.org/jira/browse/HUDI-8624 ]


Lin Liu deleted comment on HUDI-8624:
---

was (Author: JIRAUSER301185):
What is the issue here?

> Revisit commitsMetadata fetching from timeline history in 
> MergeOnReadIncrementalRelation
> 
>
> Key: HUDI-8624
> URL: https://issues.apache.org/jira/browse/HUDI-8624
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Priority: Critical
> Fix For: 1.0.1
>
>
> [https://github.com/apache/hudi/pull/12385/files#r1865249449]
> We need to revisit why we need commit metadata from timeline history.  
> Reading timeline history (archival timeline in old term) is expensive and 
> should not be incurred in incremental query except for completion time lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation

2025-01-09 Thread Lin Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Liu updated HUDI-8624:
--
Status: In Progress  (was: Open)

> Revisit commitsMetadata fetching from timeline history in 
> MergeOnReadIncrementalRelation
> 
>
> Key: HUDI-8624
> URL: https://issues.apache.org/jira/browse/HUDI-8624
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Priority: Critical
> Fix For: 1.0.1
>
>
> [https://github.com/apache/hudi/pull/12385/files#r1865249449]
> We need to revisit why we need commit metadata from timeline history.  
> Reading timeline history (archival timeline in old term) is expensive and 
> should not be incurred in incremental query except for completion time lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [Hudi-8839] CdcFileGroupIterator use spillable hashmap [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12592:
URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581423654

   
   ## CI report:
   
   * e720dcfa5656730d01e5f22e5f9a890c08c60e0d Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2738)
 
   * 28247026a78dda613a41ed2f039cbf11bb7d5d95 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2779)
 
   * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [Hudi-8839] CdcFileGroupIterator use spillable hashmap [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12592:
URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581425493

   
   ## CI report:
   
   * 28247026a78dda613a41ed2f039cbf11bb7d5d95 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2779)
 
   * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN
   * a5544e3e3d5aa734348b7bfd63820d5b8d98cc33 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-8850) COW DML does not honor Commit

2025-01-09 Thread Davis Zhang (Jira)
Davis Zhang created HUDI-8850:
-

 Summary: COW DML does not honor Commit
 Key: HUDI-8850
 URL: https://issues.apache.org/jira/browse/HUDI-8850
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Davis Zhang


[https://github.com/apache/hudi/pull/12610]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12583:
URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581382057

   
   ## CI report:
   
   * 5912957233547cef72a3427e482c176537a164b2 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2776)
 
   * 757290d3cf1ab9027f2f14f3cd22097f50939a56 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12596:
URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581382215

   
   ## CI report:
   
   * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN
   * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN
   * a0efb5a7f12042228a5444aeab00f98827dfad3a Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2777)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-8837) Fix reading partition path field on metadata bootstrap table

2025-01-09 Thread Davis Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang reassigned HUDI-8837:
-

Assignee: Y Ethan Guo  (was: Davis Zhang)

> Fix reading partition path field on metadata bootstrap table
> 
>
> Key: HUDI-8837
> URL: https://issues.apache.org/jira/browse/HUDI-8837
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Y Ethan Guo
>Priority: Blocker
> Fix For: 1.0.1
>
>
> When adding strict data validation within 
> testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals 
> that the partition path field reading fails (returns null) for some update 
> records. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-8837) Fix reading partition path field on metadata bootstrap table

2025-01-09 Thread Davis Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911714#comment-17911714
 ] 

Davis Zhang commented on HUDI-8837:
---

so we can remove the .drop(partitionColName) in the validation func you 
mentioned, ran all tests in the test suite, all green. Assigned back to you

> Fix reading partition path field on metadata bootstrap table
> 
>
> Key: HUDI-8837
> URL: https://issues.apache.org/jira/browse/HUDI-8837
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Y Ethan Guo
>Priority: Blocker
> Fix For: 1.0.1
>
>
> When adding strict data validation within 
> testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals 
> that the partition path field reading fails (returns null) for some update 
> records. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8624] Avoid check metadata for archived commits in incremental queries [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12613:
URL: https://github.com/apache/hudi/pull/12613#issuecomment-2581719260

   
   ## CI report:
   
   * 8fe93c788b78c9239f8feb90d3d78a90b8153914 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2788)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12596:
URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581720604

   
   ## CI report:
   
   * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN
   * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN
   * a0efb5a7f12042228a5444aeab00f98827dfad3a Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2777)
 
   * 15866ae0099c3b58d22329be0e5008b3149cb95f Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2791)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8762) Fix issues around incremental query

2025-01-09 Thread Y Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo updated HUDI-8762:
--
Status: In Progress  (was: Open)

> Fix issues around incremental query
> ---
>
> Key: HUDI-8762
> URL: https://issues.apache.org/jira/browse/HUDI-8762
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Y Ethan Guo
>Priority: Blocker
> Fix For: 1.0.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12596:
URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581279856

   
   ## CI report:
   
   * 04faca8ac2311fce83d759a6dbd8efb697ccbb6a Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2773)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-8828) merge into partial update on all kinds of table should work [Ethan to check the latest comment on new issues]

2025-01-09 Thread Davis Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang updated HUDI-8828:
--
Status: Patch Available  (was: In Progress)

> merge into partial update on all kinds of table should work [Ethan to check 
> the latest comment on new issues]
> -
>
> Key: HUDI-8828
> URL: https://issues.apache.org/jira/browse/HUDI-8828
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Davis Zhang
>Assignee: Davis Zhang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 1.0.1
>
>   Original Estimate: 4h
>  Time Spent: 1h
>  Remaining Estimate: 3h
>
> MOR, COW, partitioned, non partitioned, with without precombine key. 
> Global/local index.
> 0.5 days for testing + unknowns if issues spotted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12583:
URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581345336

   
   ## CI report:
   
   * fff9de91a5b865e6c07ea9bf9b8672cff90bd243 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2706)
 
   * 5912957233547cef72a3427e482c176537a164b2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12583:
URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581347876

   
   ## CI report:
   
   * fff9de91a5b865e6c07ea9bf9b8672cff90bd243 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2706)
 
   * 5912957233547cef72a3427e482c176537a164b2 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2776)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HUDI-8853) Spark sql ALTER TABLE queries are failing on EMR

2025-01-09 Thread Mansi Patel (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911702#comment-17911702
 ] 

Mansi Patel commented on HUDI-8853:
---

ALTER COLUMN is also causing issue.
{code:java}
spark.sql("ALTER TABLE mansipp_hudi_fgac_table3 ALTER COLUMN id TYPE string");
org.apache.spark.sql.AnalysisException: [NOT_SUPPORTED_CHANGE_COLUMN] ALTER 
TABLE ALTER/CHANGE COLUMN is not supported for changing 
`spark_catalog`.`default`.`mansipp_hudi_fgac_table3`'s column `id` with type 
"INT" to `id` with type "STRING". {code}
According to this table we should be able convert "int -> string".
[https://hudi.apache.org/docs/next/schema_evolution/#:~:text=DROP%20NOT%20NULL-,column%20type%20change,-Source%5CTarget]

Reproduction steps:

 
{code:java}
import org.apache.spark.sql.SaveMode
import org.apache.spark.sql.functions._
import org.apache.hudi.DataSourceWriteOptions
import org.apache.hudi.DataSourceReadOptions
import org.apache.hudi.config.HoodieWriteConfig
import org.apache.hudi.hive.MultiPartKeysValueExtractor
import org.apache.hudi.hive.HiveSyncConfig
import org.apache.hudi.sync.common.HoodieSyncConfig
// Create a DataFrame
val inputDF = Seq(
 (100, "2015-01-01", "2015-01-01T13:51:39.340396Z"),
 (101, "2015-01-01", "2015-01-01T12:14:58.597216Z"),
 (102, "2015-01-01", "2015-01-01T13:51:40.417052Z"),
 (103, "2015-01-01", "2015-01-01T13:51:40.519832Z"),
 (104, "2015-01-02", "2015-01-01T12:15:00.512679Z"),
 (105, "2015-01-02", "2015-01-01T13:51:42.248818Z")
 ).toDF("id", "creation_date", "last_update_time")
//Specify common DataSourceWriteOptions in the single hudiOptions variable 
val hudiOptions = Map[String,String](
  HoodieWriteConfig.TBL_NAME.key -> "mansipp_hudi_fgac_table3",
  DataSourceWriteOptions.TABLE_TYPE.key -> "COPY_ON_WRITE", 
  DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id",
  DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "creation_date",
  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "last_update_time",
  DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true",
  DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> "mansipp_hudi_fgac_table3",
  DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "creation_date",
  HoodieSyncConfig.META_SYNC_PARTITION_EXTRACTOR_CLASS.key -> 
"org.apache.hudi.hive.MultiPartKeysValueExtractor",
  HoodieSyncConfig.META_SYNC_ENABLED.key -> "true",
  HiveSyncConfig.HIVE_SYNC_MODE.key -> "hms",
  HoodieSyncConfig.META_SYNC_TABLE_NAME.key -> "mansipp_hudi_fgac_table3",
  HoodieSyncConfig.META_SYNC_PARTITION_FIELDS.key -> "creation_date"
)
// Write the DataFrame as a Hudi dataset
(inputDF.write
    .format("hudi")
    .options(hudiOptions)
    .option(DataSourceWriteOptions.OPERATION_OPT_KEY,"insert")
    .option("hoodie.schema.on.read.enable","true")
    .mode(SaveMode.Overwrite)
    .save("s3://mansipp-emr-dev/hudi/mansipp_hudi_fgac_table3/"))
{code}
{code:java}
spark.sql("ALTER TABLE mansipp_hudi_fgac_table3 ALTER COLUMN id TYPE string"); 
{code}

> Spark sql ALTER TABLE queries are failing on EMR
> 
>
> Key: HUDI-8853
> URL: https://issues.apache.org/jira/browse/HUDI-8853
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Affects Versions: 0.15.0
>Reporter: Mansi Patel
>Priority: Major
> Fix For: 1.0.1
>
>
> Some of the spark sql DDL queries are failing on EMR. Failed queries are 
> listed here
> 1. ALTER TABLE DROP COLUMN
> 2. ALTER TABLE REPLACE COLUMN
> 3. ALTER TABLE RENAME COLUMN
> {code:java}
> scala> spark.sql("ALTER TABLE mansipp_hudi_fgac_table DROP COLUMN 
> creation_date"); org.apache.spark.sql.AnalysisException: 
> [UNSUPPORTED_FEATURE.TABLE_OPERATION] The feature is not supported: Table 
> `spark_catalog`.`default`.`mansipp_hudi_fgac_table` does not support DROP 
> COLUMN. Please check the current catalog and namespace to make sure the 
> qualified table name is expected, and also check the catalog implementation 
> which is configured by "spark.sql.catalog". at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:847)
>  at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:837)
>  at 
> org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:110)
> {code}
> {code:java}
> scala> spark.sql("ALTER TABLE mansipp_hudi_fgac_table REPLACE COLUMNS (id 
> int, name varchar(10), city string)");
> org.apache.spark.sql.AnalysisException: [UNSUPPORTED_FEATURE.TABLE_OPERATION] 
> The feature is not supported: Table 
> `spark_catalog`.`default`.`mansipp_hudi_fgac_table` does not support REPLACE 
> COLUMNS. Please check the current catalog and namespace to make sure the 
> qualified table name is expec

Re: [PR] [HUDI-8824] MIT should error out for some assignment clause patterns [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12584:
URL: https://github.com/apache/hudi/pull/12584#issuecomment-2581479281

   
   ## CI report:
   
   * 631494d4f6e8389bf8c7a7d90a360fc1ea2d159d Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2770)
 
   * ffad81180c72f871a9677549e38f1915e5668adb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8824] MIT should error out for some assignment clause patterns [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12584:
URL: https://github.com/apache/hudi/pull/12584#issuecomment-2581481057

   
   ## CI report:
   
   * 631494d4f6e8389bf8c7a7d90a360fc1ea2d159d Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2770)
 
   * ffad81180c72f871a9677549e38f1915e5668adb Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2781)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8839] CdcFileGroupIterator use spillable hashmap [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12592:
URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581481123

   
   ## CI report:
   
   * 28247026a78dda613a41ed2f039cbf11bb7d5d95 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2779)
 
   * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN
   * a5544e3e3d5aa734348b7bfd63820d5b8d98cc33 UNKNOWN
   * 7ca86e570e17a7db2c7394d62f9d95bda8f439db Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2782)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Assigned] (HUDI-8837) Fix reading partition path field on metadata bootstrap table

2025-01-09 Thread Y Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Y Ethan Guo reassigned HUDI-8837:
-

Assignee: Davis Zhang

> Fix reading partition path field on metadata bootstrap table
> 
>
> Key: HUDI-8837
> URL: https://issues.apache.org/jira/browse/HUDI-8837
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Davis Zhang
>Priority: Blocker
> Fix For: 1.0.1
>
>
> When adding strict data validation within 
> testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals 
> that the partition path field reading fails (returns null) for some update 
> records. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-8837) Fix reading partition path field on metadata bootstrap table

2025-01-09 Thread Davis Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davis Zhang updated HUDI-8837:
--
Status: In Progress  (was: Open)

> Fix reading partition path field on metadata bootstrap table
> 
>
> Key: HUDI-8837
> URL: https://issues.apache.org/jira/browse/HUDI-8837
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Y Ethan Guo
>Assignee: Davis Zhang
>Priority: Blocker
> Fix For: 1.0.1
>
>
> When adding strict data validation within 
> testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals 
> that the partition path field reading fails (returns null) for some update 
> records. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12610:
URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581484606

   
   ## CI report:
   
   * 5fbd4a15950f9d2b214ce3617164f68ac96fdc4b Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2778)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12545:
URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581649185

   
   ## CI report:
   
   * 20a6a8c042d092026fbed250e5b313e366d2cf61 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2786)
 
   * 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12545:
URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581645223

   
   ## CI report:
   
   * 1a9b2ad8ba31a4bfb0c41f65af7d76841a946720 Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2576)
 
   * 20a6a8c042d092026fbed250e5b313e366d2cf61 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2786)
 
   * 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8624] Avoid check metadata for archived commits in incremental queries [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12613:
URL: https://github.com/apache/hudi/pull/12613#issuecomment-2581651747

   
   ## CI report:
   
   * 39ca7fae423367a6f48c5139b257176d22beac02 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2785)
 
   * 8fe93c788b78c9239f8feb90d3d78a90b8153914 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2788)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12545:
URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581639192

   
   ## CI report:
   
   * 1a9b2ad8ba31a4bfb0c41f65af7d76841a946720 Azure: 
[SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2576)
 
   * 20a6a8c042d092026fbed250e5b313e366d2cf61 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2786)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]

2025-01-09 Thread via GitHub


geserdugarov commented on PR #12545:
URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581656971

   @zhangyue19921010 , @danny0405 ,
   I've switched fix for bucket index support for append mode to its 
restriction, due to major problem with only one expected base file for bucket 
index. 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 throws exception if user tries 
to insert data using bucket index to prevent silent write in unexpected way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12596:
URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581274438

   
   ## CI report:
   
   * da34ecaa061dd1f0bce93c213c43f40b810d Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2748)
 
   * 04faca8ac2311fce83d759a6dbd8efb697ccbb6a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12610:
URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581373307

   
   ## CI report:
   
   * 4c14c955871ea88e3ff6ccfab667fe434a16a833 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2772)
 
   * 6142abfcebbf84d3bf32097c7499b60ff11ae0a1 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2774)
 
   * 5fbd4a15950f9d2b214ce3617164f68ac96fdc4b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12610:
URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581375583

   
   ## CI report:
   
   * 6142abfcebbf84d3bf32097c7499b60ff11ae0a1 Azure: 
[CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2774)
 
   * 5fbd4a15950f9d2b214ce3617164f68ac96fdc4b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12583:
URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581379791

   
   ## CI report:
   
   * fff9de91a5b865e6c07ea9bf9b8672cff90bd243 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2706)
 
   * 5912957233547cef72a3427e482c176537a164b2 Azure: 
[PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2776)
 
   * 757290d3cf1ab9027f2f14f3cd22097f50939a56 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]

2025-01-09 Thread via GitHub


hudi-bot commented on PR #12583:
URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581510528

   
   ## CI report:
   
   * 757290d3cf1ab9027f2f14f3cd22097f50939a56 Azure: 
[FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2780)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-8852) merge into partial update should not need precombine field assignment for partial update

2025-01-09 Thread Davis Zhang (Jira)
Davis Zhang created HUDI-8852:
-

 Summary: merge into partial update should not need precombine 
field assignment for partial update
 Key: HUDI-8852
 URL: https://issues.apache.org/jira/browse/HUDI-8852
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Davis Zhang


we should allow MIT delete clause to operate even if there is no precombine key 
specified in the source table in commit time ordering case.

 

Same for MIT partial update, regardless of the merge mode, if precombine key is 
absent from the source, we should fall back to commit time ordering and apply 
the change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]

2025-01-09 Thread via GitHub


Davis-Zhang-Onehouse commented on code in PR #12583:
URL: https://github.com/apache/hudi/pull/12583#discussion_r1909484960


##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTable.scala:
##
@@ -1336,44 +1339,59 @@ class TestMergeIntoTable extends HoodieSparkSqlTestBase 
with ScalaAssertionSuppo
 
   test("Test MergeInto with partial insert") {
 spark.sql(s"set ${MERGE_SMALL_FILE_GROUP_CANDIDATES_LIMIT.key} = 0")
-Seq(true, false).foreach { sparkSqlOptimizedWrites =>
+
+// Test combinations: (tableType, sparkSqlOptimizedWrites)
+val testConfigs = Seq(
+  ("mor", true),
+  ("mor", false),
+  ("cow", true),
+  ("cow", false)
+)
+
+testConfigs.foreach { case (tableType, sparkSqlOptimizedWrites) =>
+  log.info(s"=== Testing MergeInto with partial insert: 
tableType=$tableType, sparkSqlOptimizedWrites=$sparkSqlOptimizedWrites ===")
   withRecordType()(withTempDir { tmp =>
 spark.sql("set hoodie.payload.combined.schema.validate = true")
-// Create a partitioned mor table
+// Create a partitioned table
 val tableName = generateTableName
 spark.sql(
   s"""
  | create table $tableName (
  |  id bigint,
  |  name string,
  |  price double,
+ |  ts bigint,
  |  dt string
  | ) using hudi
  | tblproperties (
- |  type = 'mor',
- |  primaryKey = 'id'
+ |  type = '$tableType',
+ |  primaryKey = 'id',
+ |  precombineKey = 'ts'

Review Comment:
   not required, it is a workaround for 
https://issues.apache.org/jira/browse/HUDI-8835.
   In general, for MIT to be able to operate independently from precombine 
field, it requires more work



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTable.scala:
##
@@ -842,7 +844,8 @@ class TestMergeIntoTable extends HoodieSparkSqlTestBase 
with ScalaAssertionSuppo
   )
 
   checkAnswer(s"select id,name,price,v,dt from $tableName1 order by id")(
-Seq(1, "a1", 10, 1000, "2021-03-21")
+Seq(1, "a1", 10, 1000, "2021-03-21"),
+Seq(3, "a3", 30, 3000, "2021-03-21")

Review Comment:
   unintentional changes, reverted



##
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTable.scala:
##
@@ -22,11 +22,13 @@ import 
org.apache.hudi.DataSourceWriteOptions.SPARK_SQL_OPTIMIZED_WRITES
 import 
org.apache.hudi.config.HoodieWriteConfig.MERGE_SMALL_FILE_GROUP_CANDIDATES_LIMIT
 import org.apache.hudi.hadoop.fs.HadoopFSUtils
 import org.apache.hudi.testutils.DataSourceTestUtils
-
+import org.apache.spark.sql.hudi.ProvidesHoodieConfig.getClass

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >