Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]
hudi-bot commented on PR #12558: URL: https://github.com/apache/hudi/pull/12558#issuecomment-2579389493 ## CI report: * 4b54a2deb80ccce01ec6560927c0d143a8b5c6ba Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2756) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8841] Fix schema validating exception during flink async cluste… [hudi]
danny0405 commented on code in PR #12598: URL: https://github.com/apache/hudi/pull/12598#discussion_r1908311946 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -605,14 +605,16 @@ public static String createSchemaErrorString(String errorMessage, Schema writerS * @param nullable nullability of column type * @return a new schema with the nullabilities of the given columns updated */ - public static Schema createSchemaWithNullabilityUpdate( + public static Schema forceNullableColumns( Schema schema, List nullableUpdateCols, boolean nullable) { Review Comment: nullableUpdateCols -> columns. We can eliminate the flag `nullalbe` because it is always true. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]
geserdugarov commented on code in PR #12545: URL: https://github.com/apache/hudi/pull/12545#discussion_r1908303305 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java: ## @@ -207,11 +207,30 @@ public static DataStream append( Configuration conf, RowType rowType, DataStream dataStream) { -WriteOperatorFactory operatorFactory = AppendWriteOperator.getFactory(conf, rowType); +boolean isBucketIndex = OptionsResolver.isBucketIndexType(conf); +if (isBucketIndex) { Review Comment: Should we support use cases like `insert` some data any times, and then switch to `upsert` of another bunch of data? Because with my proposed changes we will face another problem if we want to switch `upsert` operation here: https://github.com/apache/hudi/blob/e67d0aa71e2253a5b5cf95028cdf95482ffeca6a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java#L181-L184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]
TheR1sing3un commented on code in PR #12537: URL: https://github.com/apache/hudi/pull/12537#discussion_r1908330556 ## hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestSparkConsistentBucketClustering.java: ## @@ -110,7 +115,7 @@ public void setup(int maxFileSize, Map options) throws IOExcepti .withStorageConfig(HoodieStorageConfig.newBuilder().parquetMaxFileSize(maxFileSize).build()) .withClusteringConfig(HoodieClusteringConfig.newBuilder() .withClusteringPlanStrategyClass(SparkConsistentBucketClusteringPlanStrategy.class.getName()) - .withClusteringExecutionStrategyClass(SparkConsistentBucketClusteringExecutionStrategy.class.getName()) +.withClusteringExecutionStrategyClass(singleJob ? SINGLE_SPARK_JOB_CONSISTENT_HASHING_EXECUTION_STRATEGY : SPARK_CONSISTENT_BUCKET_EXECUTION_STRATEGY) Review Comment: > Is `SINGLE_SPARK_JOB_CONSISTENT_HASHING_EXECUTION_STRATEGY` always better than `SPARK_CONSISTENT_BUCKET_EXECUTION_STRATEGY`, why we need two execution strategy. I'm not sure if any user is already using `SPARK_CONSISTENT_BUCKET_EXECUTION_STRATEGY` , and if so, should we keep it for compatibility? If not, we can deprecated it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [WIP] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]
geserdugarov commented on code in PR #12545: URL: https://github.com/apache/hudi/pull/12545#discussion_r1908303305 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/Pipelines.java: ## @@ -207,11 +207,30 @@ public static DataStream append( Configuration conf, RowType rowType, DataStream dataStream) { -WriteOperatorFactory operatorFactory = AppendWriteOperator.getFactory(conf, rowType); +boolean isBucketIndex = OptionsResolver.isBucketIndexType(conf); +if (isBucketIndex) { Review Comment: Should we support use cases like `insert` some data any times, and then switch to `upsert` of another bunch of data? Because with my proposed changes we will face another problem if we want to switch `upsert` operation here: https://github.com/apache/hudi/blob/master/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bucket/BucketStreamWriteFunction.java#L181-L184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]
TheR1sing3un commented on code in PR #12537: URL: https://github.com/apache/hudi/pull/12537#discussion_r1908331603 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkJobExecutionStrategy.java: ## @@ -0,0 +1,150 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.client.clustering.run.strategy; + +import org.apache.hudi.avro.HoodieAvroUtils; +import org.apache.hudi.common.engine.HoodieEngineContext; +import org.apache.hudi.common.model.ClusteringOperation; +import org.apache.hudi.common.model.HoodieRecord; +import org.apache.hudi.common.table.HoodieTableConfig; +import org.apache.hudi.common.table.log.HoodieFileSliceReader; +import org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner; +import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.StringUtils; +import org.apache.hudi.common.util.collection.ClosableIterator; +import org.apache.hudi.common.util.collection.CloseableMappingIterator; +import org.apache.hudi.common.util.collection.Pair; +import org.apache.hudi.config.HoodieWriteConfig; +import org.apache.hudi.exception.HoodieClusteringException; +import org.apache.hudi.io.storage.HoodieFileReader; +import org.apache.hudi.keygen.BaseKeyGenerator; +import org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory; +import org.apache.hudi.storage.HoodieStorage; +import org.apache.hudi.storage.StorageConfiguration; +import org.apache.hudi.storage.StoragePath; +import org.apache.hudi.storage.hadoop.HoodieHadoopStorage; +import org.apache.hudi.table.HoodieTable; +import org.apache.hudi.table.action.cluster.strategy.ClusteringExecutionStrategy; + +import org.apache.avro.Schema; +import org.apache.hadoop.conf.Configuration; + +import java.io.IOException; + +import static org.apache.hudi.client.utils.SparkPartitionUtils.getPartitionFieldVals; +import static org.apache.hudi.io.storage.HoodieSparkIOFactory.getHoodieSparkIOFactory; + +public abstract class SparkJobExecutionStrategy extends ClusteringExecutionStrategy { Review Comment: > Let's eliminate this class: `SparkJobExecutionStrategy ` Removed~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]
hudi-bot commented on PR #12537: URL: https://github.com/apache/hudi/pull/12537#issuecomment-2579417150 ## CI report: * ef470351aa6e521b57e3f3c5e65aa6b9b77f8634 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2755) * 16eb3c6e8ed05902b3e1142e3b8a8e58e5b76b42 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]
hudi-bot commented on PR #12537: URL: https://github.com/apache/hudi/pull/12537#issuecomment-2579421668 ## CI report: * ef470351aa6e521b57e3f3c5e65aa6b9b77f8634 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2755) * 16eb3c6e8ed05902b3e1142e3b8a8e58e5b76b42 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2757) * 34e41613300f619cf8c2d797f80c43df4ee8ea73 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]
hudi-bot commented on PR #12558: URL: https://github.com/apache/hudi/pull/12558#issuecomment-2579441606 ## CI report: * 4b54a2deb80ccce01ec6560927c0d143a8b5c6ba Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2756) * bd321ce1851f20cfeb87b5107b8d14fc849f453c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]
hudi-bot commented on PR #12558: URL: https://github.com/apache/hudi/pull/12558#issuecomment-2579444837 ## CI report: * 4b54a2deb80ccce01ec6560927c0d143a8b5c6ba Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2756) * bd321ce1851f20cfeb87b5107b8d14fc849f453c Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2758) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]
hudi-bot commented on PR #12558: URL: https://github.com/apache/hudi/pull/12558#issuecomment-2581900400 ## CI report: * e1e0ea9214a05cd585989e228639f213cc8f033f Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2792) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]
codope commented on code in PR #12596: URL: https://github.com/apache/hudi/pull/12596#discussion_r1909901588 ## azure-pipelines-20230430.yml: ## @@ -214,7 +214,7 @@ stages: displayName: Top 100 long-running testcases - job: UT_FT_3 displayName: UT spark-datasource Java Tests & DDL -timeoutInMinutes: '90' +timeoutInMinutes: '120' Review Comment: let's make sure that colstats support for test table, and lowering the test timeout is tracked somewhere. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8602] Fix a bug for incremental query [hudi]
linliu-code commented on code in PR #12385: URL: https://github.com/apache/hudi/pull/12385#discussion_r1909901739 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala: ## @@ -209,7 +209,14 @@ trait HoodieIncrementalRelationTrait extends HoodieBaseRelation { protected lazy val includedCommits: immutable.Seq[HoodieInstant] = queryContext.getInstants.asScala.toList - protected lazy val commitsMetadata = includedCommits.map(getCommitMetadata(_, super.timeline)).asJava + protected lazy val commitsMetadata = includedCommits.map( +i => { Review Comment: @danny0405 , I have checked, and the reader did not fall back to snapshot query because of the configuration is false by default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8824] MIT should error out for some assignment clause patterns [hudi]
hudi-bot commented on PR #12584: URL: https://github.com/apache/hudi/pull/12584#issuecomment-2581218611 ## CI report: * 631494d4f6e8389bf8c7a7d90a360fc1ea2d159d Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2770) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-8851) MOR delete query hits NPE when fetching ordering value
Davis Zhang created HUDI-8851: - Summary: MOR delete query hits NPE when fetching ordering value Key: HUDI-8851 URL: https://issues.apache.org/jira/browse/HUDI-8851 Project: Apache Hudi Issue Type: Bug Reporter: Davis Zhang https://github.com/apache/hudi/pull/12610 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-8762) Fix issues around incremental query
[ https://issues.apache.org/jira/browse/HUDI-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Y Ethan Guo reassigned HUDI-8762: - Assignee: Lin Liu (was: Y Ethan Guo) > Fix issues around incremental query > --- > > Key: HUDI-8762 > URL: https://issues.apache.org/jira/browse/HUDI-8762 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Lin Liu >Priority: Blocker > Fix For: 1.0.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[I] Upgrade pyo3, arrow-rs, datafusion [hudi-rs]
xushiyan opened a new issue, #242: URL: https://github.com/apache/hudi-rs/issues/242 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]
hudi-bot commented on PR #12610: URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581324422 ## CI report: * 4c14c955871ea88e3ff6ccfab667fe434a16a833 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2772) * 6142abfcebbf84d3bf32097c7499b60ff11ae0a1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]
hudi-bot commented on PR #12610: URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581326888 ## CI report: * 4c14c955871ea88e3ff6ccfab667fe434a16a833 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2772) * 6142abfcebbf84d3bf32097c7499b60ff11ae0a1 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2774) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8553) Spark SQL UPDATE and DELETE should write record positions
[ https://issues.apache.org/jira/browse/HUDI-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-8553: - Labels: pull-request-available (was: ) > Spark SQL UPDATE and DELETE should write record positions > - > > Key: HUDI-8553 > URL: https://issues.apache.org/jira/browse/HUDI-8553 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Y Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.1 > > Original Estimate: 6h > Time Spent: 5h > Remaining Estimate: 8h > > Though there is no read and write error, Spark SQL UPDATE and DELETE do not > write record positions to the log files. > {code:java} > spark-sql (default)> CREATE TABLE testing_positions.table2 ( > > ts BIGINT, > > uuid STRING, > > rider STRING, > > driver STRING, > > fare DOUBLE, > > city STRING > > ) USING HUDI > > LOCATION > 'file:///Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2' > > TBLPROPERTIES ( > > type = 'mor', > > primaryKey = 'uuid', > > preCombineField = 'ts' > > ) > > PARTITIONED BY (city); > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > Time taken: 0.4 seconds > spark-sql (default)> INSERT INTO testing_positions.table2 > > VALUES > > > (1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'), > > > (1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70 > ,'san_francisco'), > > > (1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90 > ,'san_francisco'), > > > (1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'), > > > (1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo' > ), > > > (1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40 > ,'sao_paulo' ), > > > (1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06 > ,'chennai' ), > > > (169511511,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai'); > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updated size to 436166 > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updated size to 436185 > 24/11/16 12:03:29 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updated size to 436166 > 24/11/16 12:03:30 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updated size to 436185 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436166 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436185 > 24/11/16 12:03:30 WARN HiveConf: HiveConf of name > hive.internal.ss.authz.settings.applied.marker does not exist > 24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout > does not exist > 24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.retries.wait > does not exist > Time taken: 4.843 seconds > spark-sql (default)> > > SET hoodie.merge.small.file.group.candidates.limit = 0; > hoodie.merge.small.file.group.candidates.limit 0 > Time taken: 0.018 seconds
[PR] [HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements [hudi]
yihua opened a new pull request, #12612: URL: https://github.com/apache/hudi/pull/12612 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements [hudi]
hudi-bot commented on PR #12612: URL: https://github.com/apache/hudi/pull/12612#issuecomment-2581581182 ## CI report: * 099eea2fba303c305950fad54010c503aff5c41e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8553] Support writing record positions to log blocks from Spark SQL UPDATE and DELETE statements [hudi]
hudi-bot commented on PR #12612: URL: https://github.com/apache/hudi/pull/12612#issuecomment-2581582716 ## CI report: * 099eea2fba303c305950fad54010c503aff5c41e Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2784) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-8553) Spark SQL UPDATE and DELETE should write record positions
[ https://issues.apache.org/jira/browse/HUDI-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911738#comment-17911738 ] Y Ethan Guo commented on HUDI-8553: --- I have a draft PR up which makes the prepped upsert flow write record positions to the log blocks from Spark SQL UPDATE statement. I'm going to fix a few issues before opening it up for review. > Spark SQL UPDATE and DELETE should write record positions > - > > Key: HUDI-8553 > URL: https://issues.apache.org/jira/browse/HUDI-8553 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Y Ethan Guo >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.1 > > Original Estimate: 6h > Time Spent: 5h > Remaining Estimate: 8h > > Though there is no read and write error, Spark SQL UPDATE and DELETE do not > write record positions to the log files. > {code:java} > spark-sql (default)> CREATE TABLE testing_positions.table2 ( > > ts BIGINT, > > uuid STRING, > > rider STRING, > > driver STRING, > > fare DOUBLE, > > city STRING > > ) USING HUDI > > LOCATION > 'file:///Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2' > > TBLPROPERTIES ( > > type = 'mor', > > primaryKey = 'uuid', > > preCombineField = 'ts' > > ) > > PARTITIONED BY (city); > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > Time taken: 0.4 seconds > spark-sql (default)> INSERT INTO testing_positions.table2 > > VALUES > > > (1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'), > > > (1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70 > ,'san_francisco'), > > > (1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90 > ,'san_francisco'), > > > (1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'), > > > (1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo' > ), > > > (1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40 > ,'sao_paulo' ), > > > (1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06 > ,'chennai' ), > > > (169511511,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai'); > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updated size to 436166 > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updated size to 436185 > 24/11/16 12:03:29 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updated size to 436166 > 24/11/16 12:03:30 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updated size to 436185 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436166 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436185 > 24/11/16 12:03:30 WARN HiveConf: HiveConf of name > hive.internal.ss.authz.settings.applied.marker does not exist > 24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout > does not exist > 24/11/16 12:03:30 WARN HiveConf: HiveConf of name hive.stats.retries.wait > does not exist > Time
[PR] [HUDI-8624] Avoid check metadata for archived commits in incremental queries [hudi]
linliu-code opened a new pull request, #12613: URL: https://github.com/apache/hudi/pull/12613 ### Change Logs When start commit is archived, we fall back to full scan. ### Impact Avoid expensive metadata fetching for archived instants. ### Risk level (write none, low medium or high below) Medium. ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation
[ https://issues.apache.org/jira/browse/HUDI-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-8624: - Labels: pull-request-available (was: ) > Revisit commitsMetadata fetching from timeline history in > MergeOnReadIncrementalRelation > > > Key: HUDI-8624 > URL: https://issues.apache.org/jira/browse/HUDI-8624 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Lin Liu >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.1 > > > [https://github.com/apache/hudi/pull/12385/files#r1865249449] > We need to revisit why we need commit metadata from timeline history. > Reading timeline history (archival timeline in old term) is expensive and > should not be incurred in incremental query except for completion time lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8624] Avoid check metadata for archived commits in incremental queries [hudi]
hudi-bot commented on PR #12613: URL: https://github.com/apache/hudi/pull/12613#issuecomment-2581592281 ## CI report: * 39ca7fae423367a6f48c5139b257176d22beac02 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation
[ https://issues.apache.org/jira/browse/HUDI-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-8624: -- Status: Patch Available (was: In Progress) > Revisit commitsMetadata fetching from timeline history in > MergeOnReadIncrementalRelation > > > Key: HUDI-8624 > URL: https://issues.apache.org/jira/browse/HUDI-8624 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Lin Liu >Priority: Critical > Labels: pull-request-available > Fix For: 1.0.1 > > > [https://github.com/apache/hudi/pull/12385/files#r1865249449] > We need to revisit why we need commit metadata from timeline history. > Reading timeline history (archival timeline in old term) is expensive and > should not be incurred in incremental query except for completion time lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-8635) Revisit stats generated in HoodieSparkFileGroupReaderBasedMergeHandle
[ https://issues.apache.org/jira/browse/HUDI-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Y Ethan Guo updated HUDI-8635: -- Status: In Progress (was: Open) > Revisit stats generated in HoodieSparkFileGroupReaderBasedMergeHandle > - > > Key: HUDI-8635 > URL: https://issues.apache.org/jira/browse/HUDI-8635 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Y Ethan Guo >Priority: Blocker > Fix For: 1.0.1 > > > We need to make sure the write stats generated by the new file group > reader-based merge handle for compaction ( > HoodieSparkFileGroupReaderBasedMergeHandle) are intact in all cases. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-8762) Fix issues around incremental query
[ https://issues.apache.org/jira/browse/HUDI-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-8762: -- Status: Patch Available (was: In Progress) > Fix issues around incremental query > --- > > Key: HUDI-8762 > URL: https://issues.apache.org/jira/browse/HUDI-8762 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Lin Liu >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]
hudi-bot commented on PR #12558: URL: https://github.com/apache/hudi/pull/12558#issuecomment-2581813248 ## CI report: * c5912b6788b23621a4dcc609a4d5b4e6ae0af6da Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2767) * e1e0ea9214a05cd585989e228639f213cc8f033f Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2792) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]
hudi-bot commented on PR #12558: URL: https://github.com/apache/hudi/pull/12558#issuecomment-2581811657 ## CI report: * c5912b6788b23621a4dcc609a4d5b4e6ae0af6da Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2767) * e1e0ea9214a05cd585989e228639f213cc8f033f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]
hudi-bot commented on PR #12596: URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581822443 ## CI report: * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN * 15866ae0099c3b58d22329be0e5008b3149cb95f Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2791) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8796) Silent ignoring of bucket index in Flink append mode
[ https://issues.apache.org/jira/browse/HUDI-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-8796: Description: Currently, there is no exception when we try to write data in Flink append mode using bucket index. Data will be written, but in parquet files without bucket IDs. (was: Currently, there is no exception when we try to write data in Flink append mode (COW, insert) using bucket index. Data will be written, but in parquet files without bucket IDs.) > Silent ignoring of bucket index in Flink append mode > > > Key: HUDI-8796 > URL: https://issues.apache.org/jira/browse/HUDI-8796 > Project: Apache Hudi > Issue Type: Bug >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.1 > > > Currently, there is no exception when we try to write data in Flink append > mode using bucket index. Data will be written, but in parquet files without > bucket IDs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-8796) Silent ignoring of bucket index in Flink append mode
[ https://issues.apache.org/jira/browse/HUDI-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Geser Dugarov updated HUDI-8796: Summary: Silent ignoring of bucket index in Flink append mode (was: Silent ignoring of simple bucket index in Flink append mode) > Silent ignoring of bucket index in Flink append mode > > > Key: HUDI-8796 > URL: https://issues.apache.org/jira/browse/HUDI-8796 > Project: Apache Hudi > Issue Type: Bug >Reporter: Geser Dugarov >Assignee: Geser Dugarov >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.1 > > > Currently, there is no exception when we try to write data in Flink append > mode (COW, insert) using bucket index. Data will be written, but in parquet > files without bucket IDs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated (5f591eec223 -> dc001ea4828)
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 5f591eec223 [HUDI-8762] Fix a typo in Fix a typo in TestIncrementalQueryWithArchivedInstants (#12611) add dc001ea4828 [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on (#12558) No new revisions were added by this update. Summary of changes: .../metadata/HoodieBackedTableMetadataWriter.java | 39 +- .../client/utils/SparkMetadataWriterUtils.java | 278 +++-- .../org/apache/hudi/data/HoodieJavaPairRDD.java| 10 + .../expression/HoodieSparkExpressionIndex.java | 27 + .../SparkHoodieBackedTableMetadataWriter.java | 118 ++-- .../hudi/common/data/HoodieListPairData.java | 8 + .../apache/hudi/common/data/HoodiePairData.java| 11 + .../common/model/HoodieColumnRangeMetadata.java| 4 + .../index/expression/HoodieExpressionIndex.java| 2 + .../hudi/metadata/HoodieMetadataPayload.java | 16 +- .../hudi/metadata/HoodieTableMetadataUtil.java | 165 -- .../hudi/metadata/TestHoodieMetadataPayload.java | 12 +- .../scala/org/apache/hudi/BucketIndexSupport.scala | 2 +- .../org/apache/hudi/ExpressionIndexSupport.scala | 67 ++- .../scala/org/apache/hudi/HoodieFileIndex.scala| 20 +- .../apache/hudi/PartitionStatsIndexSupport.scala | 44 +- .../hudi/command/index/TestExpressionIndex.scala | 626 - .../utilities/HoodieMetadataTableValidator.java| 2 +- 18 files changed, 1241 insertions(+), 210 deletions(-)
Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]
codope merged PR #12558: URL: https://github.com/apache/hudi/pull/12558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-8775) Expression index on a column should get tracked at partition level if partition stats index is turned on
[ https://issues.apache.org/jira/browse/HUDI-8775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-8775. - Resolution: Fixed > Expression index on a column should get tracked at partition level if > partition stats index is turned on > > > Key: HUDI-8775 > URL: https://issues.apache.org/jira/browse/HUDI-8775 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Assignee: Lokesh Jain >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.1 > > > Use case: I have partition stats index enabled, and then I create an > expression index using col stats on a {{ts}} column, mapping it to a date. In > this case, the stats based on derived value from expression should be tracked > at partition level too. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]
hudi-bot commented on PR #12596: URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581888527 ## CI report: * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN * 15866ae0099c3b58d22329be0e5008b3149cb95f Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2791) * 2710a96832046a764b7125c1152c788d96c6e1f9 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2793) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8839) [Ethan pls check worklog] CDC query: The beforeImageRecords and afterImageRecords are both in-memory hash map, they should be changes to spillable map.
[ https://issues.apache.org/jira/browse/HUDI-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davis Zhang updated HUDI-8839: -- Summary: [Ethan pls check worklog] CDC query: The beforeImageRecords and afterImageRecords are both in-memory hash map, they should be changes to spillable map. (was: CDC query: The beforeImageRecords and afterImageRecords are both in-memory hash map, they should be changes to spillable map.) > [Ethan pls check worklog] CDC query: The beforeImageRecords and > afterImageRecords are both in-memory hash map, they should be changes to > spillable map. > --- > > Key: HUDI-8839 > URL: https://issues.apache.org/jira/browse/HUDI-8839 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Davis Zhang >Assignee: Davis Zhang >Priority: Major > Fix For: 1.0.1 > > Time Spent: 3h > Remaining Estimate: 0h > > > [https://github.com/apache/hudi/pull/12592] > > acceptance criteria local testing oom no longer ooms -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-8839) [Ethan pls check worklog] CDC query: The beforeImageRecords and afterImageRecords are both in-memory hash map, they should be changes to spillable map.
[ https://issues.apache.org/jira/browse/HUDI-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-8839: - Labels: pull-request-available (was: ) > [Ethan pls check worklog] CDC query: The beforeImageRecords and > afterImageRecords are both in-memory hash map, they should be changes to > spillable map. > --- > > Key: HUDI-8839 > URL: https://issues.apache.org/jira/browse/HUDI-8839 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Davis Zhang >Assignee: Davis Zhang >Priority: Major > Labels: pull-request-available > Fix For: 1.0.1 > > Time Spent: 3h > Remaining Estimate: 0h > > > [https://github.com/apache/hudi/pull/12592] > > acceptance criteria local testing oom no longer ooms -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8839] CdcFileGroupIterator use spillable hashmap [hudi]
hudi-bot commented on PR #12592: URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581471756 ## CI report: * 28247026a78dda613a41ed2f039cbf11bb7d5d95 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2779) * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN * a5544e3e3d5aa734348b7bfd63820d5b8d98cc33 UNKNOWN * 7ca86e570e17a7db2c7394d62f9d95bda8f439db UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation
[ https://issues.apache.org/jira/browse/HUDI-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu reassigned HUDI-8624: - Assignee: Lin Liu > Revisit commitsMetadata fetching from timeline history in > MergeOnReadIncrementalRelation > > > Key: HUDI-8624 > URL: https://issues.apache.org/jira/browse/HUDI-8624 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Lin Liu >Priority: Critical > Fix For: 1.0.1 > > > [https://github.com/apache/hudi/pull/12385/files#r1865249449] > We need to revisit why we need commit metadata from timeline history. > Reading timeline history (archival timeline in old term) is expensive and > should not be incurred in incremental query except for completion time lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-8172) Make primaryKey and other column configs case insensitive
[ https://issues.apache.org/jira/browse/HUDI-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-8172: --- Assignee: Vova Kolmakov > Make primaryKey and other column configs case insensitive > - > > Key: HUDI-8172 > URL: https://issues.apache.org/jira/browse/HUDI-8172 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: Aditya Goenka >Assignee: Vova Kolmakov >Priority: Critical > Fix For: 1.1.0 > > > The primaryKey and other configs should be case insensitive. > > Test case to reproduce - > test("Test primary key case sensitive") { > withTempDir { tmp =>val tableName = generateTableName // Create a > partitioned table spark.sql(s""" |create table > $tableName ( | id int, | name string, | > price double, | ts long, | dt string |) using > hudi | tblproperties (primaryKey = 'ID' | ) | > partitioned by (dt) | location > '${tmp.getCanonicalPath}'""".stripMargin) > spark.sql(s""" | insert into $tableName | > select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-01-05' as > dt""".stripMargin) > checkAnswer(s"select id, name, price, ts, dt from $tableName")( > Seq(1, "a1", 10.0, 1000 , "2021-01-05") > ) > } > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-8172) Make primaryKey and other column configs case insensitive
[ https://issues.apache.org/jira/browse/HUDI-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-8172: Status: In Progress (was: Open) > Make primaryKey and other column configs case insensitive > - > > Key: HUDI-8172 > URL: https://issues.apache.org/jira/browse/HUDI-8172 > Project: Apache Hudi > Issue Type: Improvement > Components: spark-sql >Reporter: Aditya Goenka >Assignee: Vova Kolmakov >Priority: Critical > Fix For: 1.1.0 > > > The primaryKey and other configs should be case insensitive. > > Test case to reproduce - > test("Test primary key case sensitive") { > withTempDir { tmp =>val tableName = generateTableName // Create a > partitioned table spark.sql(s""" |create table > $tableName ( | id int, | name string, | > price double, | ts long, | dt string |) using > hudi | tblproperties (primaryKey = 'ID' | ) | > partitioned by (dt) | location > '${tmp.getCanonicalPath}'""".stripMargin) > spark.sql(s""" | insert into $tableName | > select 1 as id, 'a1' as name, 10 as price, 1000 as ts, '2021-01-05' as > dt""".stripMargin) > checkAnswer(s"select id, name, price, ts, dt from $tableName")( > Seq(1, "a1", 10.0, 1000 , "2021-01-05") > ) > } > } -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]
hudi-bot commented on PR #12545: URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581794386 ## CI report: * 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2789) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]
hudi-bot commented on PR #12537: URL: https://github.com/apache/hudi/pull/12537#issuecomment-2581802070 ## CI report: * 64ad84f40ff6a47df76979a382525fee0cc67d2e Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2790) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8775] Expression index on a column should get tracked at partition level if partition stats index is turned on [hudi]
codope commented on code in PR #12558: URL: https://github.com/apache/hudi/pull/12558#discussion_r1909890911 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java: ## @@ -395,7 +392,9 @@ public static Map> convertMetadataToRecords(Hoo if (enabledPartitionTypes.contains(MetadataPartitionType.PARTITION_STATS)) { checkState(MetadataPartitionType.COLUMN_STATS.isMetadataPartitionAvailable(dataMetaClient), "Column stats partition must be enabled to generate partition stats. Please enable: " + HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key()); - final HoodieData partitionStatsRDD = convertMetadataToPartitionStatsRecords(commitMetadata, context, dataMetaClient, metadataConfig); + // Generate Hoodie Pair data of partition name and list of column range metadata for all the files in that partition Review Comment: nit: also fix the comment -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-8837) Fix reading partition path field on metadata bootstrap table
[ https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911710#comment-17911710 ] Y Ethan Guo commented on HUDI-8837: --- The test is added in https://github.com/apache/hudi/pull/12490. Right now the validation excludes partition column. When adding that in the validation, the validation fails. {code:java} def assertDfEquals(df1: DataFrame, df2: DataFrame): Unit = { assertEquals(df1.count, df2.count) // TODO(HUDI-8723): fix reading partition path field on metadata bootstrap table assertEquals(0, df1.drop(partitionColName).except(df2.drop(partitionColName)).count) assertEquals(0, df2.drop(partitionColName).except(df1.drop(partitionColName)).count) } {code} > Fix reading partition path field on metadata bootstrap table > > > Key: HUDI-8837 > URL: https://issues.apache.org/jira/browse/HUDI-8837 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Davis Zhang >Priority: Blocker > Fix For: 1.0.1 > > > When adding strict data validation within > testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals > that the partition path field reading fails (returns null) for some update > records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-8553) Spark SQL UPDATE and DELETE should write record positions
[ https://issues.apache.org/jira/browse/HUDI-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911713#comment-17911713 ] Y Ethan Guo commented on HUDI-8553: --- In the UPDATE and DELETE command, we'll try creating the relation with a schema that has the row index meta column or a new hoodie meta column to attach the row index column to the return DF (this also requires the file group reader and parquet reader to keep the new row index column by fixing the wiring). In that way, we can pass the positions down to the prepped write flow and prepare the HoodieRecords with the current record location. > Spark SQL UPDATE and DELETE should write record positions > - > > Key: HUDI-8553 > URL: https://issues.apache.org/jira/browse/HUDI-8553 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Y Ethan Guo >Priority: Blocker > Fix For: 1.0.1 > > Original Estimate: 6h > Time Spent: 5h > Remaining Estimate: 8h > > Though there is no read and write error, Spark SQL UPDATE and DELETE do not > write record positions to the log files. > {code:java} > spark-sql (default)> CREATE TABLE testing_positions.table2 ( > > ts BIGINT, > > uuid STRING, > > rider STRING, > > driver STRING, > > fare DOUBLE, > > city STRING > > ) USING HUDI > > LOCATION > 'file:///Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2' > > TBLPROPERTIES ( > > type = 'mor', > > primaryKey = 'uuid', > > preCombineField = 'ts' > > ) > > PARTITIONED BY (city); > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > Time taken: 0.4 seconds > spark-sql (default)> INSERT INTO testing_positions.table2 > > VALUES > > > (1695159649087,'334e26e9-8355-45cc-97c6-c31daf0df330','rider-A','driver-K',19.10,'san_francisco'), > > > (1695091554788,'e96c4396-3fad-413a-a942-4cb36106d721','rider-C','driver-M',27.70 > ,'san_francisco'), > > > (1695046462179,'9909a8b1-2d15-4d3d-8ec9-efc48c536a00','rider-D','driver-L',33.90 > ,'san_francisco'), > > > (1695332066204,'1dced545-862b-4ceb-8b43-d2a568f6616b','rider-E','driver-O',93.50,'san_francisco'), > > > (1695516137016,'e3cf430c-889d-4015-bc98-59bdce1e530c','rider-F','driver-P',34.15,'sao_paulo' > ), > > > (1695376420876,'7a84095f-737f-40bc-b62f-6b69664712d2','rider-G','driver-Q',43.40 > ,'sao_paulo' ), > > > (1695173887231,'3eeb61f7-c2b0-4636-99bd-5d7a5a1d2c04','rider-I','driver-S',41.06 > ,'chennai' ), > > > (169511511,'c8abbe79-8d89-47ea-b4ce-4d224bae5bfa','rider-J','driver-T',17.85,'chennai'); > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > 24/11/16 12:03:26 WARN TableSchemaResolver: Could not find any data file > written for commit, so could not get schema for table > file:/Users/ethan/Work/tmp/hudi-1.0.0-testing/positional/table2 > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updated size to 436166 > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updating partition stats fast for: table2_ro > 24/11/16 12:03:29 WARN log: Updated size to 436185 > 24/11/16 12:03:29 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updated size to 436166 > 24/11/16 12:03:30 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2_rt > 24/11/16 12:03:30 WARN log: Updated size to 436185 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436166 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436386 > 24/11/16 12:03:30 WARN log: Updating partition stats fast for: table2 > 24/11/16 12:03:30 WARN log: Updated size to 436185 > 24/11/16 12:03:30 WARN HiveConf: HiveConf of name > hive.internal.ss.authz.settings.applied.marker does
[jira] [Updated] (HUDI-8762) Fix issues around incremental query
[ https://issues.apache.org/jira/browse/HUDI-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-8762: - Labels: pull-request-available (was: ) > Fix issues around incremental query > --- > > Key: HUDI-8762 > URL: https://issues.apache.org/jira/browse/HUDI-8762 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Lin Liu >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-8762] Fix a typo in a test [hudi]
linliu-code opened a new pull request, #12611: URL: https://github.com/apache/hudi/pull/12611 ### Change Logs The config was not set correctly. ### Impact Fixed a typo. ### Risk level (write none, low medium or high below) None. ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8762] Fix a typo in a test [hudi]
hudi-bot commented on PR #12611: URL: https://github.com/apache/hudi/pull/12611#issuecomment-2581543130 ## CI report: * 441dfd77c5036cfac3ce7a84cd7984408f5b6b64 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8762] Fix a typo in a test [hudi]
hudi-bot commented on PR #12611: URL: https://github.com/apache/hudi/pull/12611#issuecomment-2581544722 ## CI report: * 441dfd77c5036cfac3ce7a84cd7984408f5b6b64 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2783) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8839] CdcFileGroupIterator use spillable hashmap [hudi]
hudi-bot commented on PR #12592: URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581548015 ## CI report: * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN * a5544e3e3d5aa734348b7bfd63820d5b8d98cc33 UNKNOWN * 7ca86e570e17a7db2c7394d62f9d95bda8f439db Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2782) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8824] MIT should error out for some assignment clause patterns [hudi]
hudi-bot commented on PR #12584: URL: https://github.com/apache/hudi/pull/12584#issuecomment-2581546292 ## CI report: * ffad81180c72f871a9677549e38f1915e5668adb Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2781) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]
hudi-bot commented on PR #12545: URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581691059 ## CI report: * 20a6a8c042d092026fbed250e5b313e366d2cf61 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2786) * 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2789) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-8854) Support LocalDate with ordering value in DeleteRecord
[ https://issues.apache.org/jira/browse/HUDI-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-8854: - Assignee: sivabalan narayanan > Support LocalDate with ordering value in DeleteRecord > - > > Key: HUDI-8854 > URL: https://issues.apache.org/jira/browse/HUDI-8854 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Fix For: 1.0.2 > > > We are removing LocalDate support for ordering value in this patch > [https://github.com/apache/hudi/pull/12596] > > We wanted to add it back. Filing this tracking ticket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-8854) Support LocalDate with ordering value in DeleteRecord
sivabalan narayanan created HUDI-8854: - Summary: Support LocalDate with ordering value in DeleteRecord Key: HUDI-8854 URL: https://issues.apache.org/jira/browse/HUDI-8854 Project: Apache Hudi Issue Type: Improvement Components: writer-core Reporter: sivabalan narayanan We are removing LocalDate support for ordering value in this patch [https://github.com/apache/hudi/pull/12596] We wanted to add it back. Filing this tracking ticket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-8854) Support LocalDate with ordering value in DeleteRecord
[ https://issues.apache.org/jira/browse/HUDI-8854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-8854: -- Fix Version/s: 1.0.2 > Support LocalDate with ordering value in DeleteRecord > - > > Key: HUDI-8854 > URL: https://issues.apache.org/jira/browse/HUDI-8854 > Project: Apache Hudi > Issue Type: Improvement > Components: writer-core >Reporter: sivabalan narayanan >Priority: Major > Fix For: 1.0.2 > > > We are removing LocalDate support for ordering value in this patch > [https://github.com/apache/hudi/pull/12596] > > We wanted to add it back. Filing this tracking ticket. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]
TheR1sing3un commented on code in PR #12537: URL: https://github.com/apache/hudi/pull/12537#discussion_r1909786316 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/strategy/ClusteringExecutionStrategy.java: ## @@ -67,4 +85,69 @@ protected HoodieEngineContext getEngineContext() { protected HoodieWriteConfig getWriteConfig() { return this.writeConfig; } + + protected ClosableIterator> getRecordIteratorWithLogFiles(ClusteringOperation operation, String instantTime, long maxMemory) { +HoodieWriteConfig config = getWriteConfig(); +HoodieTable table = getHoodieTable(); +StorageConfiguration storageConf = table.getStorageConf(); +HoodieTableConfig tableConfig = table.getMetaClient().getTableConfig(); +String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null); +Option partitionFields = tableConfig.getPartitionFields(); +HoodieMergedLogRecordScanner scanner = HoodieMergedLogRecordScanner.newBuilder() +.withStorage(table.getStorage()) +.withBasePath(table.getMetaClient().getBasePath()) +.withLogFilePaths(operation.getDeltaFilePaths()) +.withReaderSchema(readerSchemaWithMetaFields) +.withLatestInstantTime(instantTime) +.withMaxMemorySizeInBytes(maxMemory) +.withReverseReader(config.getCompactionReverseLogReadEnabled()) +.withBufferSize(config.getMaxDFSStreamBufferSize()) +.withSpillableMapBasePath(config.getSpillableMapBasePath()) +.withPartition(operation.getPartitionPath()) +.withOptimizedLogBlocksScan(config.enableOptimizedLogBlocksScan()) +.withDiskMapType(config.getCommonConfig().getSpillableDiskMapType()) + .withBitCaskDiskMapCompressionEnabled(config.getCommonConfig().isBitCaskDiskMapCompressionEnabled()) +.withRecordMerger(config.getRecordMerger()) +.withTableMetaClient(table.getMetaClient()) +.build(); + +Option baseFileReader = StringUtils.isNullOrEmpty(operation.getDataFilePath()) +? Option.empty() +: Option.of(getBaseOrBootstrapFileReader(storageConf, bootstrapBasePath, partitionFields, operation)); +Option keyGeneratorOp = getKeyGenerator(); +try { + return new HoodieFileSliceReader(baseFileReader, scanner, readerSchemaWithMetaFields, tableConfig.getPreCombineField(), config.getRecordMerger(), + tableConfig.getProps(), + tableConfig.populateMetaFields() ? Option.empty() : Option.of(Pair.of(tableConfig.getRecordKeyFieldProp(), + tableConfig.getPartitionFieldProp())), keyGeneratorOp); +} catch (IOException e) { + throw new HoodieClusteringException("Error reading file slices", e); +} + } + + protected ClosableIterator> getRecordIteratorWithBaseFileOnly(ClusteringOperation operation) { +StorageConfiguration storageConf = getHoodieTable().getStorageConf(); +HoodieTableConfig tableConfig = getHoodieTable().getMetaClient().getTableConfig(); +String bootstrapBasePath = tableConfig.getBootstrapBasePath().orElse(null); +Option partitionFields = tableConfig.getPartitionFields(); +HoodieFileReader baseFileReader = getBaseOrBootstrapFileReader(storageConf, bootstrapBasePath, partitionFields, operation); + +Option keyGeneratorOp = getKeyGenerator(); +// NOTE: Record have to be cloned here to make sure if it holds low-level engine-specific +// payload pointing into a shared, mutable (underlying) buffer we get a clean copy of +// it since these records will be shuffled later. +ClosableIterator baseRecordsIterator; +try { + baseRecordsIterator = baseFileReader.getRecordIterator(readerSchemaWithMetaFields); +} catch (IOException e) { + throw new HoodieClusteringException("Error reading base file", e); +} +return new CloseableMappingIterator( +baseRecordsIterator, +rec -> ((HoodieRecord) rec).copy().wrapIntoHoodieRecordPayloadWithKeyGen(readerSchemaWithMetaFields, writeConfig.getProps(), keyGeneratorOp)); + } + + protected abstract Option getKeyGenerator(); Review Comment: > Let's remove the interface for `getKeyGenerator` and `getBaseOrBootstrapFileReader`, they are just utilities. done~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]
hudi-bot commented on PR #12596: URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581698677 ## CI report: * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN * a0efb5a7f12042228a5444aeab00f98827dfad3a Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2777) * 15866ae0099c3b58d22329be0e5008b3149cb95f UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]
hudi-bot commented on PR #12537: URL: https://github.com/apache/hudi/pull/12537#issuecomment-2581698449 ## CI report: * 6198247de5d01f8edaf4976efffdffa6e6674b64 Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2763) * 64ad84f40ff6a47df76979a382525fee0cc67d2e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8800] Introduce SingleSparkConsistentBucketClusteringExecutionStrategy to improve performance [hudi]
hudi-bot commented on PR #12537: URL: https://github.com/apache/hudi/pull/12537#issuecomment-2581699957 ## CI report: * 6198247de5d01f8edaf4976efffdffa6e6674b64 Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2763) * 64ad84f40ff6a47df76979a382525fee0cc67d2e Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2790) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8851) MOR delete query hits NPE when fetching ordering value
[ https://issues.apache.org/jira/browse/HUDI-8851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davis Zhang updated HUDI-8851: -- Description: [https://github.com/apache/hudi/pull/12610] when running the delete statement of the test, we got Job aborted due to stage failure: Task 0 in stage 440.0 failed 1 times, most recent failure: Lost task 0.0 in stage 440.0 (TID 610) (daviss-mbp.attlocal.net executor driver): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:319) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:252) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:908) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:908) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:380) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1548) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1458) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1522) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:378) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.NullPointerException at org.apache.spark.sql.HoodieUnsafeRowUtils$.getNestedInternalRowValue(HoodieUnsafeRowUtils.scala:69) at org.apache.spark.sql.HoodieUnsafeRowUtils.getNestedInternalRowValue(HoodieUnsafeRowUtils.scala) at org.apache.hudi.common.model.HoodieSparkRecord.getOrderingValue(HoodieSparkRecord.java:322) at org.apache.hudi.io.HoodieAppendHandle.writeToBuffer(HoodieAppendHandle.java:608) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:465) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:312) ... 29 more Driver stacktrace: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 440.0 failed 1 times, most recent failure: Lost task 0.0 in stage 440.0 (TID 610) (daviss-mbp.attlocal.net executor driver): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:319) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:252) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:908) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:908) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367) at org.apache.spark.rdd.RDD.iterator(RDD.scala:331) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache
[jira] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation
[ https://issues.apache.org/jira/browse/HUDI-8624 ] Lin Liu deleted comment on HUDI-8624: --- was (Author: JIRAUSER301185): What is the issue here? > Revisit commitsMetadata fetching from timeline history in > MergeOnReadIncrementalRelation > > > Key: HUDI-8624 > URL: https://issues.apache.org/jira/browse/HUDI-8624 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Priority: Critical > Fix For: 1.0.1 > > > [https://github.com/apache/hudi/pull/12385/files#r1865249449] > We need to revisit why we need commit metadata from timeline history. > Reading timeline history (archival timeline in old term) is expensive and > should not be incurred in incremental query except for completion time lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-8624) Revisit commitsMetadata fetching from timeline history in MergeOnReadIncrementalRelation
[ https://issues.apache.org/jira/browse/HUDI-8624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-8624: -- Status: In Progress (was: Open) > Revisit commitsMetadata fetching from timeline history in > MergeOnReadIncrementalRelation > > > Key: HUDI-8624 > URL: https://issues.apache.org/jira/browse/HUDI-8624 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Priority: Critical > Fix For: 1.0.1 > > > [https://github.com/apache/hudi/pull/12385/files#r1865249449] > We need to revisit why we need commit metadata from timeline history. > Reading timeline history (archival timeline in old term) is expensive and > should not be incurred in incremental query except for completion time lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [Hudi-8839] CdcFileGroupIterator use spillable hashmap [hudi]
hudi-bot commented on PR #12592: URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581423654 ## CI report: * e720dcfa5656730d01e5f22e5f9a890c08c60e0d Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2738) * 28247026a78dda613a41ed2f039cbf11bb7d5d95 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2779) * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [Hudi-8839] CdcFileGroupIterator use spillable hashmap [hudi]
hudi-bot commented on PR #12592: URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581425493 ## CI report: * 28247026a78dda613a41ed2f039cbf11bb7d5d95 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2779) * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN * a5544e3e3d5aa734348b7bfd63820d5b8d98cc33 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-8850) COW DML does not honor Commit
Davis Zhang created HUDI-8850: - Summary: COW DML does not honor Commit Key: HUDI-8850 URL: https://issues.apache.org/jira/browse/HUDI-8850 Project: Apache Hudi Issue Type: Bug Reporter: Davis Zhang [https://github.com/apache/hudi/pull/12610] -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]
hudi-bot commented on PR #12583: URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581382057 ## CI report: * 5912957233547cef72a3427e482c176537a164b2 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2776) * 757290d3cf1ab9027f2f14f3cd22097f50939a56 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]
hudi-bot commented on PR #12596: URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581382215 ## CI report: * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN * a0efb5a7f12042228a5444aeab00f98827dfad3a Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2777) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-8837) Fix reading partition path field on metadata bootstrap table
[ https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davis Zhang reassigned HUDI-8837: - Assignee: Y Ethan Guo (was: Davis Zhang) > Fix reading partition path field on metadata bootstrap table > > > Key: HUDI-8837 > URL: https://issues.apache.org/jira/browse/HUDI-8837 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Y Ethan Guo >Priority: Blocker > Fix For: 1.0.1 > > > When adding strict data validation within > testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals > that the partition path field reading fails (returns null) for some update > records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HUDI-8837) Fix reading partition path field on metadata bootstrap table
[ https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911714#comment-17911714 ] Davis Zhang commented on HUDI-8837: --- so we can remove the .drop(partitionColName) in the validation func you mentioned, ran all tests in the test suite, all green. Assigned back to you > Fix reading partition path field on metadata bootstrap table > > > Key: HUDI-8837 > URL: https://issues.apache.org/jira/browse/HUDI-8837 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Y Ethan Guo >Priority: Blocker > Fix For: 1.0.1 > > > When adding strict data validation within > testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals > that the partition path field reading fails (returns null) for some update > records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8624] Avoid check metadata for archived commits in incremental queries [hudi]
hudi-bot commented on PR #12613: URL: https://github.com/apache/hudi/pull/12613#issuecomment-2581719260 ## CI report: * 8fe93c788b78c9239f8feb90d3d78a90b8153914 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2788) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]
hudi-bot commented on PR #12596: URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581720604 ## CI report: * ae2ca606c6cd125f31b7ed029968d0993b1bb0bd UNKNOWN * 71b6a13890909b81c74ce7b138237ab695a08782 UNKNOWN * a0efb5a7f12042228a5444aeab00f98827dfad3a Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2777) * 15866ae0099c3b58d22329be0e5008b3149cb95f Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2791) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8762) Fix issues around incremental query
[ https://issues.apache.org/jira/browse/HUDI-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Y Ethan Guo updated HUDI-8762: -- Status: In Progress (was: Open) > Fix issues around incremental query > --- > > Key: HUDI-8762 > URL: https://issues.apache.org/jira/browse/HUDI-8762 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Y Ethan Guo >Priority: Blocker > Fix For: 1.0.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]
hudi-bot commented on PR #12596: URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581279856 ## CI report: * 04faca8ac2311fce83d759a6dbd8efb697ccbb6a Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2773) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-8828) merge into partial update on all kinds of table should work [Ethan to check the latest comment on new issues]
[ https://issues.apache.org/jira/browse/HUDI-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davis Zhang updated HUDI-8828: -- Status: Patch Available (was: In Progress) > merge into partial update on all kinds of table should work [Ethan to check > the latest comment on new issues] > - > > Key: HUDI-8828 > URL: https://issues.apache.org/jira/browse/HUDI-8828 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Davis Zhang >Assignee: Davis Zhang >Priority: Blocker > Labels: pull-request-available > Fix For: 1.0.1 > > Original Estimate: 4h > Time Spent: 1h > Remaining Estimate: 3h > > MOR, COW, partitioned, non partitioned, with without precombine key. > Global/local index. > 0.5 days for testing + unknowns if issues spotted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]
hudi-bot commented on PR #12583: URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581345336 ## CI report: * fff9de91a5b865e6c07ea9bf9b8672cff90bd243 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2706) * 5912957233547cef72a3427e482c176537a164b2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]
hudi-bot commented on PR #12583: URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581347876 ## CI report: * fff9de91a5b865e6c07ea9bf9b8672cff90bd243 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2706) * 5912957233547cef72a3427e482c176537a164b2 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2776) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-8853) Spark sql ALTER TABLE queries are failing on EMR
[ https://issues.apache.org/jira/browse/HUDI-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17911702#comment-17911702 ] Mansi Patel commented on HUDI-8853: --- ALTER COLUMN is also causing issue. {code:java} spark.sql("ALTER TABLE mansipp_hudi_fgac_table3 ALTER COLUMN id TYPE string"); org.apache.spark.sql.AnalysisException: [NOT_SUPPORTED_CHANGE_COLUMN] ALTER TABLE ALTER/CHANGE COLUMN is not supported for changing `spark_catalog`.`default`.`mansipp_hudi_fgac_table3`'s column `id` with type "INT" to `id` with type "STRING". {code} According to this table we should be able convert "int -> string". [https://hudi.apache.org/docs/next/schema_evolution/#:~:text=DROP%20NOT%20NULL-,column%20type%20change,-Source%5CTarget] Reproduction steps: {code:java} import org.apache.spark.sql.SaveMode import org.apache.spark.sql.functions._ import org.apache.hudi.DataSourceWriteOptions import org.apache.hudi.DataSourceReadOptions import org.apache.hudi.config.HoodieWriteConfig import org.apache.hudi.hive.MultiPartKeysValueExtractor import org.apache.hudi.hive.HiveSyncConfig import org.apache.hudi.sync.common.HoodieSyncConfig // Create a DataFrame val inputDF = Seq( (100, "2015-01-01", "2015-01-01T13:51:39.340396Z"), (101, "2015-01-01", "2015-01-01T12:14:58.597216Z"), (102, "2015-01-01", "2015-01-01T13:51:40.417052Z"), (103, "2015-01-01", "2015-01-01T13:51:40.519832Z"), (104, "2015-01-02", "2015-01-01T12:15:00.512679Z"), (105, "2015-01-02", "2015-01-01T13:51:42.248818Z") ).toDF("id", "creation_date", "last_update_time") //Specify common DataSourceWriteOptions in the single hudiOptions variable val hudiOptions = Map[String,String]( HoodieWriteConfig.TBL_NAME.key -> "mansipp_hudi_fgac_table3", DataSourceWriteOptions.TABLE_TYPE.key -> "COPY_ON_WRITE", DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "id", DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "creation_date", DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "last_update_time", DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true", DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> "mansipp_hudi_fgac_table3", DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "creation_date", HoodieSyncConfig.META_SYNC_PARTITION_EXTRACTOR_CLASS.key -> "org.apache.hudi.hive.MultiPartKeysValueExtractor", HoodieSyncConfig.META_SYNC_ENABLED.key -> "true", HiveSyncConfig.HIVE_SYNC_MODE.key -> "hms", HoodieSyncConfig.META_SYNC_TABLE_NAME.key -> "mansipp_hudi_fgac_table3", HoodieSyncConfig.META_SYNC_PARTITION_FIELDS.key -> "creation_date" ) // Write the DataFrame as a Hudi dataset (inputDF.write .format("hudi") .options(hudiOptions) .option(DataSourceWriteOptions.OPERATION_OPT_KEY,"insert") .option("hoodie.schema.on.read.enable","true") .mode(SaveMode.Overwrite) .save("s3://mansipp-emr-dev/hudi/mansipp_hudi_fgac_table3/")) {code} {code:java} spark.sql("ALTER TABLE mansipp_hudi_fgac_table3 ALTER COLUMN id TYPE string"); {code} > Spark sql ALTER TABLE queries are failing on EMR > > > Key: HUDI-8853 > URL: https://issues.apache.org/jira/browse/HUDI-8853 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Affects Versions: 0.15.0 >Reporter: Mansi Patel >Priority: Major > Fix For: 1.0.1 > > > Some of the spark sql DDL queries are failing on EMR. Failed queries are > listed here > 1. ALTER TABLE DROP COLUMN > 2. ALTER TABLE REPLACE COLUMN > 3. ALTER TABLE RENAME COLUMN > {code:java} > scala> spark.sql("ALTER TABLE mansipp_hudi_fgac_table DROP COLUMN > creation_date"); org.apache.spark.sql.AnalysisException: > [UNSUPPORTED_FEATURE.TABLE_OPERATION] The feature is not supported: Table > `spark_catalog`.`default`.`mansipp_hudi_fgac_table` does not support DROP > COLUMN. Please check the current catalog and namespace to make sure the > qualified table name is expected, and also check the catalog implementation > which is configured by "spark.sql.catalog". at > org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:847) > at > org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedTableOperationError(QueryCompilationErrors.scala:837) > at > org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog$$anonfun$apply$1.applyOrElse(ResolveSessionCatalog.scala:110) > {code} > {code:java} > scala> spark.sql("ALTER TABLE mansipp_hudi_fgac_table REPLACE COLUMNS (id > int, name varchar(10), city string)"); > org.apache.spark.sql.AnalysisException: [UNSUPPORTED_FEATURE.TABLE_OPERATION] > The feature is not supported: Table > `spark_catalog`.`default`.`mansipp_hudi_fgac_table` does not support REPLACE > COLUMNS. Please check the current catalog and namespace to make sure the > qualified table name is expec
Re: [PR] [HUDI-8824] MIT should error out for some assignment clause patterns [hudi]
hudi-bot commented on PR #12584: URL: https://github.com/apache/hudi/pull/12584#issuecomment-2581479281 ## CI report: * 631494d4f6e8389bf8c7a7d90a360fc1ea2d159d Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2770) * ffad81180c72f871a9677549e38f1915e5668adb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8824] MIT should error out for some assignment clause patterns [hudi]
hudi-bot commented on PR #12584: URL: https://github.com/apache/hudi/pull/12584#issuecomment-2581481057 ## CI report: * 631494d4f6e8389bf8c7a7d90a360fc1ea2d159d Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2770) * ffad81180c72f871a9677549e38f1915e5668adb Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2781) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8839] CdcFileGroupIterator use spillable hashmap [hudi]
hudi-bot commented on PR #12592: URL: https://github.com/apache/hudi/pull/12592#issuecomment-2581481123 ## CI report: * 28247026a78dda613a41ed2f039cbf11bb7d5d95 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2779) * 423421ec00e72021f081c901ac74891a266b8aa5 UNKNOWN * a5544e3e3d5aa734348b7bfd63820d5b8d98cc33 UNKNOWN * 7ca86e570e17a7db2c7394d62f9d95bda8f439db Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2782) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-8837) Fix reading partition path field on metadata bootstrap table
[ https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Y Ethan Guo reassigned HUDI-8837: - Assignee: Davis Zhang > Fix reading partition path field on metadata bootstrap table > > > Key: HUDI-8837 > URL: https://issues.apache.org/jira/browse/HUDI-8837 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Davis Zhang >Priority: Blocker > Fix For: 1.0.1 > > > When adding strict data validation within > testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals > that the partition path field reading fails (returns null) for some update > records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-8837) Fix reading partition path field on metadata bootstrap table
[ https://issues.apache.org/jira/browse/HUDI-8837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davis Zhang updated HUDI-8837: -- Status: In Progress (was: Open) > Fix reading partition path field on metadata bootstrap table > > > Key: HUDI-8837 > URL: https://issues.apache.org/jira/browse/HUDI-8837 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Y Ethan Guo >Assignee: Davis Zhang >Priority: Blocker > Fix For: 1.0.1 > > > When adding strict data validation within > testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals > that the partition path field reading fails (returns null) for some update > records. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]
hudi-bot commented on PR #12610: URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581484606 ## CI report: * 5fbd4a15950f9d2b214ce3617164f68ac96fdc4b Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2778) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]
hudi-bot commented on PR #12545: URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581649185 ## CI report: * 20a6a8c042d092026fbed250e5b313e366d2cf61 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2786) * 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]
hudi-bot commented on PR #12545: URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581645223 ## CI report: * 1a9b2ad8ba31a4bfb0c41f65af7d76841a946720 Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2576) * 20a6a8c042d092026fbed250e5b313e366d2cf61 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2786) * 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8624] Avoid check metadata for archived commits in incremental queries [hudi]
hudi-bot commented on PR #12613: URL: https://github.com/apache/hudi/pull/12613#issuecomment-2581651747 ## CI report: * 39ca7fae423367a6f48c5139b257176d22beac02 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2785) * 8fe93c788b78c9239f8feb90d3d78a90b8153914 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2788) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]
hudi-bot commented on PR #12545: URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581639192 ## CI report: * 1a9b2ad8ba31a4bfb0c41f65af7d76841a946720 Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2576) * 20a6a8c042d092026fbed250e5b313e366d2cf61 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2786) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8796] Silent ignoring of simple bucket index in Flink append mode [hudi]
geserdugarov commented on PR #12545: URL: https://github.com/apache/hudi/pull/12545#issuecomment-2581656971 @zhangyue19921010 , @danny0405 , I've switched fix for bucket index support for append mode to its restriction, due to major problem with only one expected base file for bucket index. 3efc78274b41c22ac6d2695e715fd157a9b9a9b8 throws exception if user tries to insert data using bucket index to prevent silent write in unexpected way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8766] Enabling cols stats by default with writer [hudi]
hudi-bot commented on PR #12596: URL: https://github.com/apache/hudi/pull/12596#issuecomment-2581274438 ## CI report: * da34ecaa061dd1f0bce93c213c43f40b810d Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2748) * 04faca8ac2311fce83d759a6dbd8efb697ccbb6a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]
hudi-bot commented on PR #12610: URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581373307 ## CI report: * 4c14c955871ea88e3ff6ccfab667fe434a16a833 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2772) * 6142abfcebbf84d3bf32097c7499b60ff11ae0a1 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2774) * 5fbd4a15950f9d2b214ce3617164f68ac96fdc4b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8832] Add merge mode test coverage for DML [hudi]
hudi-bot commented on PR #12610: URL: https://github.com/apache/hudi/pull/12610#issuecomment-2581375583 ## CI report: * 6142abfcebbf84d3bf32097c7499b60ff11ae0a1 Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2774) * 5fbd4a15950f9d2b214ce3617164f68ac96fdc4b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]
hudi-bot commented on PR #12583: URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581379791 ## CI report: * fff9de91a5b865e6c07ea9bf9b8672cff90bd243 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2706) * 5912957233547cef72a3427e482c176537a164b2 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2776) * 757290d3cf1ab9027f2f14f3cd22097f50939a56 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]
hudi-bot commented on PR #12583: URL: https://github.com/apache/hudi/pull/12583#issuecomment-2581510528 ## CI report: * 757290d3cf1ab9027f2f14f3cd22097f50939a56 Azure: [FAILURE](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=2780) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-8852) merge into partial update should not need precombine field assignment for partial update
Davis Zhang created HUDI-8852: - Summary: merge into partial update should not need precombine field assignment for partial update Key: HUDI-8852 URL: https://issues.apache.org/jira/browse/HUDI-8852 Project: Apache Hudi Issue Type: Bug Reporter: Davis Zhang we should allow MIT delete clause to operate even if there is no precombine key specified in the source table in commit time ordering case. Same for MIT partial update, regardless of the merge mode, if precombine key is absent from the source, we should fall back to commit time ordering and apply the change. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-8828] Test coverage of MIT partial update [hudi]
Davis-Zhang-Onehouse commented on code in PR #12583: URL: https://github.com/apache/hudi/pull/12583#discussion_r1909484960 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTable.scala: ## @@ -1336,44 +1339,59 @@ class TestMergeIntoTable extends HoodieSparkSqlTestBase with ScalaAssertionSuppo test("Test MergeInto with partial insert") { spark.sql(s"set ${MERGE_SMALL_FILE_GROUP_CANDIDATES_LIMIT.key} = 0") -Seq(true, false).foreach { sparkSqlOptimizedWrites => + +// Test combinations: (tableType, sparkSqlOptimizedWrites) +val testConfigs = Seq( + ("mor", true), + ("mor", false), + ("cow", true), + ("cow", false) +) + +testConfigs.foreach { case (tableType, sparkSqlOptimizedWrites) => + log.info(s"=== Testing MergeInto with partial insert: tableType=$tableType, sparkSqlOptimizedWrites=$sparkSqlOptimizedWrites ===") withRecordType()(withTempDir { tmp => spark.sql("set hoodie.payload.combined.schema.validate = true") -// Create a partitioned mor table +// Create a partitioned table val tableName = generateTableName spark.sql( s""" | create table $tableName ( | id bigint, | name string, | price double, + | ts bigint, | dt string | ) using hudi | tblproperties ( - | type = 'mor', - | primaryKey = 'id' + | type = '$tableType', + | primaryKey = 'id', + | precombineKey = 'ts' Review Comment: not required, it is a workaround for https://issues.apache.org/jira/browse/HUDI-8835. In general, for MIT to be able to operate independently from precombine field, it requires more work ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTable.scala: ## @@ -842,7 +844,8 @@ class TestMergeIntoTable extends HoodieSparkSqlTestBase with ScalaAssertionSuppo ) checkAnswer(s"select id,name,price,v,dt from $tableName1 order by id")( -Seq(1, "a1", 10, 1000, "2021-03-21") +Seq(1, "a1", 10, 1000, "2021-03-21"), +Seq(3, "a3", 30, 3000, "2021-03-21") Review Comment: unintentional changes, reverted ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/TestMergeIntoTable.scala: ## @@ -22,11 +22,13 @@ import org.apache.hudi.DataSourceWriteOptions.SPARK_SQL_OPTIMIZED_WRITES import org.apache.hudi.config.HoodieWriteConfig.MERGE_SMALL_FILE_GROUP_CANDIDATES_LIMIT import org.apache.hudi.hadoop.fs.HadoopFSUtils import org.apache.hudi.testutils.DataSourceTestUtils - +import org.apache.spark.sql.hudi.ProvidesHoodieConfig.getClass Review Comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org