[GitHub] [hudi] huangweifeng7 commented on issue #143: Tracking ticket for folks to be added to slack group
huangweifeng7 commented on issue #143: URL: https://github.com/apache/hudi/issues/143#issuecomment-1006357688 Please add me to slack group Email:huangweifeng_n...@126.com Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4471: [HUDI-3125] spark-sql write timestamp directly
hudi-bot commented on pull request #4471: URL: https://github.com/apache/hudi/pull/4471#issuecomment-1006360315 ## CI report: * a5dcf171a39b236a74b9a70b0eb0b49e74ebc3b5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4934) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4471: [HUDI-3125] spark-sql write timestamp directly
hudi-bot removed a comment on pull request #4471: URL: https://github.com/apache/hudi/pull/4471#issuecomment-1006331887 ## CI report: * 29b1742747a4195db690d09f09de972ab7f409db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4895) * a5dcf171a39b236a74b9a70b0eb0b49e74ebc3b5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4934) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
leesf commented on a change in pull request #4514: URL: https://github.com/apache/hudi/pull/4514#discussion_r779368601 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala ## @@ -0,0 +1,316 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import scala.collection.JavaConverters._ +import java.net.URI +import java.util.{Date, Locale, Properties} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path + +import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport} +import org.apache.hudi.client.common.HoodieSparkEngineContext +import org.apache.hudi.common.config.DFSPropertiesConfiguration +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver} +import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, HoodieInstantTimeGenerator} +import org.apache.spark.SPARK_VERSION +import org.apache.spark.sql.{Column, DataFrame, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType} +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, Expression, Literal} +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf} +import org.apache.spark.api.java.JavaSparkContext +import org.apache.spark.sql.types.{DataType, NullType, StringType, StructField, StructType} + +import java.text.SimpleDateFormat + +import scala.collection.immutable.Map + +object HoodieSqlCommonUtils extends SparkAdapterSupport { Review comment: yes, the code is moved from HoodieSqlUtils to make more reuse and no new methods added. Also I intend to call it as a *Utils class to keep align with HoodieSqlUtils which also extends SparkAdapterSupport. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
leesf commented on pull request #4514: URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006364350 > Let me make another pass at all the pom changes. That seems to be main thing here. In the meantime, could you clarify these comments? > > Also have you tested these changes across spark 2.x and 3.1/3.2 bundles ? @vinothchandar Yes, I have manually tested it with spark 3.2.0 and spark 3.1.2 version on spark sql. and the CI tested it on spark 2.4.x and works well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
leesf commented on a change in pull request #4514: URL: https://github.com/apache/hudi/pull/4514#discussion_r779369728 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala ## @@ -0,0 +1,316 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import scala.collection.JavaConverters._ +import java.net.URI +import java.util.{Date, Locale, Properties} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path + +import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport} +import org.apache.hudi.client.common.HoodieSparkEngineContext +import org.apache.hudi.common.config.DFSPropertiesConfiguration +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver} +import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, HoodieInstantTimeGenerator} +import org.apache.spark.SPARK_VERSION +import org.apache.spark.sql.{Column, DataFrame, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType} +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, Expression, Literal} +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf} +import org.apache.spark.api.java.JavaSparkContext +import org.apache.spark.sql.types.{DataType, NullType, StringType, StructField, StructType} + +import java.text.SimpleDateFormat + +import scala.collection.immutable.Map + +object HoodieSqlCommonUtils extends SparkAdapterSupport { + // NOTE: {@code SimpleDataFormat} is NOT thread-safe + // TODO replace w/ DateTimeFormatter + private val defaultDateFormat = + ThreadLocal.withInitial(new java.util.function.Supplier[SimpleDateFormat] { +override def get() = new SimpleDateFormat("-MM-dd") + }) + + def isHoodieTable(table: CatalogTable): Boolean = { +table.provider.map(_.toLowerCase(Locale.ROOT)).orNull == "hudi" + } + + def isHoodieTable(tableId: TableIdentifier, spark: SparkSession): Boolean = { +val table = spark.sessionState.catalog.getTableMetadata(tableId) +isHoodieTable(table) + } + + def isHoodieTable(table: LogicalPlan, spark: SparkSession): Boolean = { +tripAlias(table) match { + case LogicalRelation(_, _, Some(tbl), _) => isHoodieTable(tbl) + case relation: UnresolvedRelation => +isHoodieTable(sparkAdapter.toTableIdentifier(relation), spark) + case _=> false +} + } + + def getTableIdentify(table: LogicalPlan): TableIdentifier = { Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
hudi-bot commented on pull request #4514: URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006367031 ## CI report: * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931) * 28d85528277ec6fbe72cd81dd69667495ec58165 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
hudi-bot removed a comment on pull request #4514: URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006323042 ## CI report: * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
hudi-bot removed a comment on pull request #4521: URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006329173 ## CI report: * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932) * d708467de740637a394375335181979a343979bd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
hudi-bot commented on pull request #4521: URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006367102 ## CI report: * d708467de740637a394375335181979a343979bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a change in pull request #2768: [HUDI-485]: corrected the check for incremental sql
codope commented on a change in pull request #2768: URL: https://github.com/apache/hudi/pull/2768#discussion_r779376837 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHiveIncrementalPuller.java ## @@ -41,4 +67,84 @@ public void testInitHiveIncrementalPuller() { } + private HiveIncrementalPuller.Config getHivePullerConfig(String incrementalSql) throws IOException { +config.hiveJDBCUrl = hiveSyncConfig.jdbcUrl; +config.hiveUsername = hiveSyncConfig.hiveUser; +config.hivePassword = hiveSyncConfig.hivePass; +config.hoodieTmpDir = Files.createTempDirectory("hivePullerTest").toUri().toString(); +config.sourceDb = hiveSyncConfig.databaseName; +config.sourceTable = hiveSyncConfig.tableName; +config.targetDb = "tgtdb"; +config.targetTable = "test2"; +config.tmpDb = "tmp_db"; +config.fromCommitTime = "100"; +createIncrementalSqlFile(incrementalSql, config); +return config; + } + + private void createIncrementalSqlFile(String text, HiveIncrementalPuller.Config cfg) throws IOException { +java.nio.file.Path path = Paths.get(cfg.hoodieTmpDir + "/incremental_pull.txt"); +Files.createDirectories(path.getParent()); +Files.createFile(path); +try (FileWriter fr = new FileWriter(new File(path.toUri( { + fr.write(text); +} catch (Exception e) { + // no-op +} +cfg.incrementalSQLFile = path.toString(); + } + + private void createSourceTable() throws IOException, URISyntaxException { +String instantTime = "101"; +HiveTestUtil.createCOWTable(instantTime, 5, true); +hiveSyncConfig.syncMode = "jdbc"; +HiveTestUtil.hiveSyncConfig.batchSyncNum = 3; +HiveSyncTool tool = new HiveSyncTool(hiveSyncConfig, HiveTestUtil.getHiveConf(), fileSystem); +tool.syncHoodieTable(); + } + + private void createTargetTable() throws IOException, URISyntaxException { +String instantTime = "100"; +targetBasePath = Files.createTempDirectory("hivesynctest1" + Instant.now().toEpochMilli()).toUri().toString(); +HiveTestUtil.createCOWTable(instantTime, 5, true, +targetBasePath, "tgtdb", "test2"); +HiveSyncTool tool = new HiveSyncTool(getTargetHiveSyncConfig(targetBasePath), HiveTestUtil.getHiveConf(), fileSystem); +tool.syncHoodieTable(); + } + + private HiveSyncConfig getTargetHiveSyncConfig(String basePath) { +HiveSyncConfig config = HiveSyncConfig.copy(hiveSyncConfig); +config.databaseName = "tgtdb"; +config.tableName = "test2"; +config.basePath = basePath; +config.batchSyncNum = 3; +config.syncMode = "jdbc"; +return config; + } + + private void createTables() throws IOException, URISyntaxException { +createSourceTable(); +createTargetTable(); + } + + @Test + public void testPullerWithoutIncrementalClause() throws IOException, URISyntaxException { Review comment: @pratyakshsharma Is the patch ready? If not, can you please update the happy flow test case even if it's failing. I can take over and try to fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
leesf commented on a change in pull request #4514: URL: https://github.com/apache/hudi/pull/4514#discussion_r779377180 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala ## @@ -0,0 +1,316 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hudi + +import scala.collection.JavaConverters._ +import java.net.URI +import java.util.{Date, Locale, Properties} +import org.apache.hadoop.conf.Configuration +import org.apache.hadoop.fs.Path + +import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport} +import org.apache.hudi.client.common.HoodieSparkEngineContext +import org.apache.hudi.common.config.DFSPropertiesConfiguration +import org.apache.hudi.common.config.HoodieMetadataConfig +import org.apache.hudi.common.fs.FSUtils +import org.apache.hudi.common.model.HoodieRecord +import org.apache.hudi.common.table.{HoodieTableMetaClient, TableSchemaResolver} +import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, HoodieInstantTimeGenerator} +import org.apache.spark.SPARK_VERSION +import org.apache.spark.sql.{Column, DataFrame, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType} +import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, Expression, Literal} +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias} +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf} +import org.apache.spark.api.java.JavaSparkContext +import org.apache.spark.sql.types.{DataType, NullType, StringType, StructField, StructType} + +import java.text.SimpleDateFormat + +import scala.collection.immutable.Map + +object HoodieSqlCommonUtils extends SparkAdapterSupport { + // NOTE: {@code SimpleDataFormat} is NOT thread-safe + // TODO replace w/ DateTimeFormatter + private val defaultDateFormat = + ThreadLocal.withInitial(new java.util.function.Supplier[SimpleDateFormat] { +override def get() = new SimpleDateFormat("-MM-dd") + }) + + def isHoodieTable(table: CatalogTable): Boolean = { +table.provider.map(_.toLowerCase(Locale.ROOT)).orNull == "hudi" + } + + def isHoodieTable(tableId: TableIdentifier, spark: SparkSession): Boolean = { +val table = spark.sessionState.catalog.getTableMetadata(tableId) +isHoodieTable(table) + } + + def isHoodieTable(table: LogicalPlan, spark: SparkSession): Boolean = { +tripAlias(table) match { + case LogicalRelation(_, _, Some(tbl), _) => isHoodieTable(tbl) + case relation: UnresolvedRelation => +isHoodieTable(sparkAdapter.toTableIdentifier(relation), spark) + case _=> false +} + } + + def getTableIdentify(table: LogicalPlan): TableIdentifier = { Review comment: done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on pull request #4515: [HUDI-3158] Reduce warn logs in Spark SQL INSERT OVERWRITE
xushiyan commented on pull request #4515: URL: https://github.com/apache/hudi/pull/4515#issuecomment-1006374242 @dongkelun the warn log comes from clustering planning. can you help clarify how would this change avoid the repeated warn logs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] dongkelun commented on pull request #4515: [HUDI-3158] Reduce warn logs in Spark SQL INSERT OVERWRITE
dongkelun commented on pull request #4515: URL: https://github.com/apache/hudi/pull/4515#issuecomment-1006384615 > @dongkelun the warn log comes from clustering planning. can you help clarify how would this change avoid the repeated warn logs? Hello, the reason for the warning is that the content of replaceCommitRequestedInstantinstant is empty,There are many places to call this method, such as `HoodieSparkTable.create`: ```scala if (refreshTimeline) { hoodieSparkTable.getHoodieView().sync(); } ``` There are also many places to call `HoodieSparkTable.create`,Therefore, it is not easy to reduce the warning log. It is better to avoid this warning directly from the source.`INSERT_OVERWRITE`'s commitActionType is `REPLACE_COMMIT_ACTION`,It creates an empty `replaceCommitRequestedInstantinstant ` in the `startCommitWithTime` method,We can avoid this warning from the source by changing it to non empty -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
hudi-bot commented on pull request #4514: URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006386632 ## CI report: * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931) * 28d85528277ec6fbe72cd81dd69667495ec58165 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4936) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
hudi-bot removed a comment on pull request #4514: URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006367031 ## CI report: * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931) * 28d85528277ec6fbe72cd81dd69667495ec58165 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
hudi-bot removed a comment on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006153120 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
hudi-bot commented on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006400478 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915) * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests
hudi-bot removed a comment on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006131415 ## CI report: * 97502fa31dda3b94645631303e134bf0d652c17e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests
hudi-bot commented on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006400740 ## CI report: * 97502fa31dda3b94645631303e134bf0d652c17e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913) * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests
hudi-bot removed a comment on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006400740 ## CI report: * 97502fa31dda3b94645631303e134bf0d652c17e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913) * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests
hudi-bot commented on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006402795 ## CI report: * 97502fa31dda3b94645631303e134bf0d652c17e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913) * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN * 280360f772b47ffab15655e4679b021e151783d7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests
hudi-bot removed a comment on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006402795 ## CI report: * 97502fa31dda3b94645631303e134bf0d652c17e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913) * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN * 280360f772b47ffab15655e4679b021e151783d7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests
hudi-bot commented on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006404803 ## CI report: * 97502fa31dda3b94645631303e134bf0d652c17e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913) * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN * 280360f772b47ffab15655e4679b021e151783d7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4937) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
hudi-bot commented on pull request #4514: URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006428061 ## CI report: * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN * 28d85528277ec6fbe72cd81dd69667495ec58165 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4936) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation
hudi-bot removed a comment on pull request #4514: URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006386632 ## CI report: * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931) * 28d85528277ec6fbe72cd81dd69667495ec58165 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4936) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
hudi-bot commented on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006435512 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915) * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4938) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
hudi-bot removed a comment on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006400478 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915) * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] AirToSupply opened a new issue #4522: [SUPPORT] hudi-flink support timestamp-micros
AirToSupply opened a new issue #4522: URL: https://github.com/apache/hudi/issues/4522 **To Reproduce** Steps to reproduce the behavior: 1. The spark engine is used to write data into the hoodie table(PS: There are timestamp type columns in the dataset field). 2. Use the Flink engine to read the hoodie table written in step 1. **Expected behavior** Caused by: java.lang.IllegalArgumentException: Avro does not support TIMESTAMP type with precision: 6, it only supports precision less than 3. at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221) ~... at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263) ~... at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169) ~... at org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239) ~... at org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155) ~... at org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65) ~... **Environment Description** * Hudi version : 0.11.0-SNAPSHOT * Spark version : 3.1.2 * Flink version : 1.13.1 * Hive version : None * Hadoop version : 2.9.2 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : None **Additional context** We are using hoodie as a data lake to deliver projects to customers. We found such application scenarios: write data to the hoodie table through the spark engine, and then read data from the hoodie table through the finlk engine. It should be noted that the above exception will be caused by how to write to the column containing the timestamp in the dataset. In order to simplify the description of the problem, we summarize the problem into the following steps: 【step-1】Mock data: ```shell /home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \ --driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \ --master spark://2-120:7077 \ --executor-memory 4g \ --driver-memory 4g \ --num-executors 4 \ --total-executor-cores 4 \ --name test \ --jars /home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ --conf spark.sql.hive.convertMetastoreParquet=false ``` ```sql val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt") df.write.format("hudi"). option("hoodie.datasource.write.recordkey.field", "id"). option("hoodie.datasource.write.precombine.field", "id"). option("hoodie.datasource.write.keygenerator.class", "org.apache.hudi.keygen.NonpartitionedKeyGenerator"). option("hoodie.upsert.shuffle.parallelism", "2"). option("hoodie.table.name", "timestamp_table"). mode("append"). save("/hudi/suite/data_type_timestamp_table") spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false) ``` 【step-2】Consumption data through flink: ```shell bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar ``` ```sql create table data_type_timestamp_table ( `id` INT, `name` STRING, `dt` TIMESTAMP(6) ) with ( 'connector' = 'hudi', 'hoodie.table.name' = 'data_type_timestamp_table', 'read.streaming.enabled' = 'true', 'hoodie.datasource.write.recordkey.field' = 'id', 'path' = '/hudi/suite/data_type_timestamp_table', 'read.streaming.check-interval' = '10', 'table.type' = 'COPY_ON_WRITE', 'write.precombine.field' = 'id' ); select * from data_type_timestamp_table; ``` As shown below:  If we changge timestamp (6) to timestamp (3),the result is as follows:  The data can be found here, but the display is incorrect! After checking It is found in the Hoodie directory that the spark write timestamp type is timestamp micros:  However, the timestamp type of hook reading and writing Hoodie data is timestamp-millis!Therefore, it is problematic for us to read and write timestamp types through Spark and Flink computing engines. We hope that hudi-flink module needs t
[GitHub] [hudi] zhangyue19921010 commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
zhangyue19921010 commented on pull request #4521: URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006441474 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3184) hudi-flink support timestamp-micros
Well Tang created HUDI-3184: --- Summary: hudi-flink support timestamp-micros Key: HUDI-3184 URL: https://issues.apache.org/jira/browse/HUDI-3184 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Well Tang -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3184) hudi-flink support timestamp-micros
[ https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Well Tang updated HUDI-3184: Fix Version/s: 0.11.0 > hudi-flink support timestamp-micros > --- > > Key: HUDI-3184 > URL: https://issues.apache.org/jira/browse/HUDI-3184 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Well Tang >Priority: Major > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
hudi-bot commented on pull request #4521: URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006443490 ## CI report: * d708467de740637a394375335181979a343979bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4939) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
hudi-bot removed a comment on pull request #4521: URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006367102 ## CI report: * d708467de740637a394375335181979a343979bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (HUDI-2779) Cache BaseDir if HudiTableNotFound Exception thrown
[ https://issues.apache.org/jira/browse/HUDI-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui An reassigned HUDI-2779: Assignee: Hui An > Cache BaseDir if HudiTableNotFound Exception thrown > --- > > Key: HUDI-2779 > URL: https://issues.apache.org/jira/browse/HUDI-2779 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Hui An >Assignee: Hui An >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0, 0.10.1 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (HUDI-3162) Shade AWS dependencies for bundled packages
[ https://issues.apache.org/jira/browse/HUDI-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui An closed HUDI-3162. Resolution: Duplicate > Shade AWS dependencies for bundled packages > --- > > Key: HUDI-3162 > URL: https://issues.apache.org/jira/browse/HUDI-3162 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Hui An >Assignee: Hui An >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3184) hudi-flink support timestamp-micros
[ https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Well Tang reassigned HUDI-3184: --- Attachment: 3.png 2.png 1.png Assignee: Well Tang Description: {*}Problem overview{*}: Steps to reproduce the behavior: ①The spark engine is used to write data into the hoodie table(PS: There are timestamp type columns in the dataset field). ②Use the Flink engine to read the hoodie table written in step 1. *Expected behavior* Caused by: java.lang.IllegalArgumentException: Avro does not support TIMESTAMP type with precision: 6, it only supports precision less than 3. at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221) ~... at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263) ~... at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169) ~... at org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239) ~... at org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155) ~... at org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65) ~... *Environment Description* Hudi version : 0.11.0-SNAPSHOT Spark version : 3.1.2 Flink version : 1.13.1 Hive version : None Hadoop version : 2.9.2 Storage (HDFS/S3/GCS..) : HDFS Running on Docker? (yes/no) : None *Additional context* We are using hoodie as a data lake to deliver projects to customers. We found such application scenarios: write data to the hoodie table through the spark engine, and then read data from the hoodie table through the finlk engine. It should be noted that the above exception will be caused by how to write to the column containing the timestamp in the dataset. In order to simplify the description of the problem, we summarize the problem into the following steps: 【step-1】Mock data: {code:java} /home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \ --driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \ --master spark://2-120:7077 \ --executor-memory 4g \ --driver-memory 4g \ --num-executors 4 \ --total-executor-cores 4 \ --name test \ --jars /home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ --conf spark.sql.hive.convertMetastoreParquet=false {code} {code:java} val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt") df.write.format("hudi"). option("hoodie.datasource.write.recordkey.field", "id"). option("hoodie.datasource.write.precombine.field", "id"). option("hoodie.datasource.write.keygenerator.class", "org.apache.hudi.keygen.NonpartitionedKeyGenerator"). option("hoodie.upsert.shuffle.parallelism", "2"). option("hoodie.table.name", "timestamp_table"). mode("append"). save("/hudi/suite/data_type_timestamp_table") spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false) {code} 【step-2】Consumption data through flink: {code:java} bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar {code} {code:java} create table data_type_timestamp_table ( `id` INT, `name` STRING, `dt` TIMESTAMP(6) ) with ( 'connector' = 'hudi', 'hoodie.table.name' = 'data_type_timestamp_table', 'read.streaming.enabled' = 'true', 'hoodie.datasource.write.recordkey.field' = 'id', 'path' = '/hudi/suite/data_type_timestamp_table', 'read.streaming.check-interval' = '10', 'table.type' = 'COPY_ON_WRITE', 'write.precombine.field' = 'id' ); select * from data_type_timestamp_table; {code} As shown below: !1.png! If we changge timestamp (6) to timestamp (3),the result is as follows: !2.png! The data can be found here, but the display is incorrect! After checking It is found in the Hoodie directory that the spark write timestamp type is timestamp micros: !3.png! However, the timestamp type of hook reading and writing Hoodie data is timestamp-millis!Therefore, it is problematic for us to read and write timestamp types through Spark and Flink computing engines. We hope that hudi-flink module needs to support timestamp micros and cannot lose time accuracy. Labels: pull-request-available (was: ) Remaining Estimate: 120h Original Estimate: 120h > hudi-flink support timestamp-micros > --- > > Key: HUDI-3184 > URL: https://issues.apache.org/jira/browse/HUDI-3184 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Re
[jira] [Updated] (HUDI-3184) hudi-flink support timestamp-micros
[ https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Well Tang updated HUDI-3184: Description: {*}Problem overview{*}: Steps to reproduce the behavior: ①The spark engine is used to write data into the hoodie table(PS: There are timestamp type columns in the dataset field). ②Use the Flink engine to read the hoodie table written in step 1. *Expected behavior* Caused by: java.lang.IllegalArgumentException: Avro does not support TIMESTAMP type with precision: 6, it only supports precision less than 3. at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221) ~... at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263) ~... at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169) ~... at org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239) ~... at org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155) ~... at org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65) ~... *Environment Description* Hudi version : 0.11.0-SNAPSHOT Spark version : 3.1.2 Flink version : 1.13.1 Hive version : None Hadoop version : 2.9.2 Storage (HDFS/S3/GCS..) : HDFS Running on Docker? (yes/no) : None *Additional context* We are using hoodie as a data lake to deliver projects to customers. We found such application scenarios: write data to the hoodie table through the spark engine, and then read data from the hoodie table through the finlk engine. It should be noted that the above exception will be caused by how to write to the column containing the timestamp in the dataset. In order to simplify the description of the problem, we summarize the problem into the following steps: 【step-1】Mock data: {code:java} /home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \ --driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \ --master spark://2-120:7077 \ --executor-memory 4g \ --driver-memory 4g \ --num-executors 4 \ --total-executor-cores 4 \ --name test \ --jars /home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar \ --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ --conf spark.sql.hive.convertMetastoreParquet=false {code} {code:java} val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt") df.write.format("hudi"). option("hoodie.datasource.write.recordkey.field", "id"). option("hoodie.datasource.write.precombine.field", "id"). option("hoodie.datasource.write.keygenerator.class", "org.apache.hudi.keygen.NonpartitionedKeyGenerator"). option("hoodie.upsert.shuffle.parallelism", "2"). option("hoodie.table.name", "timestamp_table"). mode("append"). save("/hudi/suite/data_type_timestamp_table") spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false) {code} 【step-2】Consumption data through flink: {code:java} bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar {code} {code:java} create table data_type_timestamp_table ( `id` INT, `name` STRING, `dt` TIMESTAMP(6) ) with ( 'connector' = 'hudi', 'hoodie.table.name' = 'data_type_timestamp_table', 'read.streaming.enabled' = 'true', 'hoodie.datasource.write.recordkey.field' = 'id', 'path' = '/hudi/suite/data_type_timestamp_table', 'read.streaming.check-interval' = '10', 'table.type' = 'COPY_ON_WRITE', 'write.precombine.field' = 'id' ); select * from data_type_timestamp_table; {code} As shown below: !1.png! If we changge timestamp (6) to timestamp (3),the result is as follows: !2.png! The data can be found here, but the display is incorrect! After checking It is found in the Hoodie directory that the spark write timestamp type is timestamp micros: !3.png! However, the timestamp type of hook reading and writing Hoodie data is timestamp-millis!Therefore, it is problematic for us to read and write timestamp types through Spark and Flink computing engines. We hope that hudi-flink module needs to support timestamp micros and cannot lose time accuracy. was: {*}Problem overview{*}: Steps to reproduce the behavior: ①The spark engine is used to write data into the hoodie table(PS: There are timestamp type columns in the dataset field). ②Use the Flink engine to read the hoodie table written in step 1. *Expected behavior* Caused by: java.lang.IllegalArgumentException: Avro does not support TIMESTAMP type with precision: 6, it only supports precision less than 3. at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221) ~... at org.apache.hudi.util.AvroSchemaConverter.convertToSchema(Avr
[jira] [Updated] (HUDI-3184) hudi-flink support timestamp-micros
[ https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Well Tang updated HUDI-3184: Status: In Progress (was: Open) > hudi-flink support timestamp-micros > --- > > Key: HUDI-3184 > URL: https://issues.apache.org/jira/browse/HUDI-3184 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Well Tang >Assignee: Well Tang >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: 1.png, 2.png, 3.png > > Original Estimate: 5h > Remaining Estimate: 5h > > {*}Problem overview{*}: > Steps to reproduce the behavior: > ①The spark engine is used to write data into the hoodie table(PS: There are > timestamp type columns in the dataset field). > ②Use the Flink engine to read the hoodie table written in step 1. > *Expected behavior* > Caused by: java.lang.IllegalArgumentException: Avro does not support > TIMESTAMP type with precision: 6, it only supports precision less than 3. > at > org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221) > ~... > at > org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263) > ~... > at > org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169) > ~... > at > org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239) > ~... > at > org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155) > ~... > at > org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65) > ~... > *Environment Description* > Hudi version : 0.11.0-SNAPSHOT > Spark version : 3.1.2 > Flink version : 1.13.1 > Hive version : None > Hadoop version : 2.9.2 > Storage (HDFS/S3/GCS..) : HDFS > Running on Docker? (yes/no) : None > *Additional context* > We are using hoodie as a data lake to deliver projects to customers. We found > such application scenarios: write data to the hoodie table through the spark > engine, and then read data from the hoodie table through the finlk engine. > It should be noted that the above exception will be caused by how to write to > the column containing the timestamp in the dataset. > In order to simplify the description of the problem, we summarize the problem > into the following steps: > 【step-1】Mock data: > {code:java} > /home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \ > --driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \ > --master spark://2-120:7077 \ > --executor-memory 4g \ > --driver-memory 4g \ > --num-executors 4 \ > --total-executor-cores 4 \ > --name test \ > --jars > /home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar > \ > --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ > --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ > --conf spark.sql.hive.convertMetastoreParquet=false {code} > {code:java} > val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt") > df.write.format("hudi"). > option("hoodie.datasource.write.recordkey.field", "id"). > option("hoodie.datasource.write.precombine.field", "id"). > option("hoodie.datasource.write.keygenerator.class", > "org.apache.hudi.keygen.NonpartitionedKeyGenerator"). > option("hoodie.upsert.shuffle.parallelism", "2"). > option("hoodie.table.name", "timestamp_table"). > mode("append"). > save("/hudi/suite/data_type_timestamp_table") > spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false) > {code} > 【step-2】Consumption data through flink: > {code:java} > bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar > {code} > {code:java} > create table data_type_timestamp_table ( > `id` INT, > `name` STRING, > `dt` TIMESTAMP(6) > ) with ( > 'connector' = 'hudi', > 'hoodie.table.name' = 'data_type_timestamp_table', > 'read.streaming.enabled' = 'true', > 'hoodie.datasource.write.recordkey.field' = 'id', > 'path' = '/hudi/suite/data_type_timestamp_table', > 'read.streaming.check-interval' = '10', > 'table.type' = 'COPY_ON_WRITE', > 'write.precombine.field' = 'id' > ); > select * from data_type_timestamp_table; {code} > As shown below: > !1.png! > If we changge timestamp (6) to timestamp (3),the result is as follows: > !2.png! > The data can be found here, but the display is incorrect! > After checking It is found in the Hoodie directory that the spark write > timestamp type is timestamp micros: > !3.png! > However, the timestamp type of hook reading and writing Hoodie data is > timestamp-millis!Therefore, it is problematic for us to read and
[GitHub] [hudi] zhangyue19921010 commented on pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI
zhangyue19921010 commented on pull request #1274: URL: https://github.com/apache/hudi/pull/1274#issuecomment-1006465368 Hi guys, it seems that there 's a little problem with the regex pattern ` private static final Pattern ARCHIVE_FILE_PATTERN = Pattern.compile("^\\.commits_\\.archive\\.([0-9]*)$");` Just raise a PR https://github.com/apache/hudi/pull/4521 trying to fix it. Wish you're interested and help me review? Thanks a lot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 closed pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
zhangyue19921010 closed pull request #4521: URL: https://github.com/apache/hudi/pull/4521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] AirToSupply commented on issue #4522: [SUPPORT] hudi-flink support timestamp-micros
AirToSupply commented on issue #4522: URL: https://github.com/apache/hudi/issues/4522#issuecomment-1006466436 @AirToSupply Thanks, https://issues.apache.org/jira/browse/HUDI-3184 issue created here ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests
hudi-bot commented on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006470119 ## CI report: * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN * 280360f772b47ffab15655e4679b021e151783d7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4937) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests
hudi-bot removed a comment on pull request #4516: URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006404803 ## CI report: * 97502fa31dda3b94645631303e134bf0d652c17e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913) * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN * 280360f772b47ffab15655e4679b021e151783d7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4937) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
hudi-bot commented on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006484026 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4938) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
hudi-bot removed a comment on pull request #4352: URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006435512 ## CI report: * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915) * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4938) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
hudi-bot removed a comment on pull request #4521: URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006443490 ## CI report: * d708467de740637a394375335181979a343979bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4939) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter
hudi-bot commented on pull request #4521: URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006493903 ## CI report: * d708467de740637a394375335181979a343979bd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4939) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 closed issue #4027: [SUPPORT] Structured streaming Async clustering IndexOutOfBoundsException
liujinhui1994 closed issue #4027: URL: https://github.com/apache/hudi/issues/4027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope opened a new pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata
codope opened a new pull request #4523: URL: https://github.com/apache/hudi/pull/4523 ## What is the purpose of the pull request - Add top level INIDEX action type. - Add supporting methods in HoodieTimeline. - Add index commit metadata which contains index plan. ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3173) Introduce new INDEX action type
[ https://issues.apache.org/jira/browse/HUDI-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3173: - Labels: pull-request-available (was: ) > Introduce new INDEX action type > --- > > Key: HUDI-3173 > URL: https://issues.apache.org/jira/browse/HUDI-3173 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Add a top level INDEX action type and supporting methods in HoodieTimeline. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata
hudi-bot commented on pull request #4523: URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006542672 ## CI report: * 700a87f4f67a1cac8f5b870882ab7b61628b4020 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata
hudi-bot removed a comment on pull request #4523: URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006542672 ## CI report: * 700a87f4f67a1cac8f5b870882ab7b61628b4020 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata
hudi-bot commented on pull request #4523: URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006545249 ## CI report: * 700a87f4f67a1cac8f5b870882ab7b61628b4020 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4940) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
codope commented on pull request #4203: URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006547783 > @nsivabalan @codope I have a discussion related to this implement. In this pr, most of work is just to pass `isConsistentLogicalTimestampEnabled` to the method `HoodieAvroUtils.convertValueForAvroLogicalTypes`. What if we have another config need to do this in the future? @YannByron You bring up a good point. Adding another config in future is tedious. However, the intention behind adding a new config was to avoid discrepancy in existing pipelines. @nsivabalan has explained this in more detail on the jira HUDI-2909. I do not expect such changes to be frequent. Nevertheless, i'll try to avoid making incompatible changes to public APIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nochimow commented on issue #4299: [SUPPORT] Upsert performance decreased after 3 years of data loading
nochimow commented on issue #4299: URL: https://github.com/apache/hudi/issues/4299#issuecomment-1006558081 Hi, Still waiting for some updates on this case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #4440: [HUDI-3100] Add config for hive conditional sync
nsivabalan commented on pull request #4440: URL: https://github.com/apache/hudi/pull/4440#issuecomment-1006567579 I am ok adding it. I am seeing this as, filling in a gap we had previously. I understand, it is debatable whether to consider as bug fix or not. but I feel, we can add it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1850) Read on table fails if the first write to table failed
[ https://issues.apache.org/jira/browse/HUDI-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1850: -- Sprint: Hudi-Sprint-Jan-3 (was: Hudi 0.10.1 - 2021/01/03) > Read on table fails if the first write to table failed > -- > > Key: HUDI-1850 > URL: https://issues.apache.org/jira/browse/HUDI-1850 > Project: Apache Hudi > Issue Type: Bug > Components: Spark Integration >Affects Versions: 0.8.0 >Reporter: Vaibhav Sinha >Priority: Major > Labels: core-flow-ds, pull-request-available, release-blocker, > sev:high, spark > Fix For: 0.11.0, 0.10.1 > > Attachments: Screenshot 2021-04-24 at 7.53.22 PM.png > > > {code:java} > ava.util.NoSuchElementException: No value present in Option > at org.apache.hudi.common.util.Option.get(Option.java:88) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromCommitMetadata(TableSchemaResolver.java:215) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:166) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:155) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.hudi.MergeOnReadSnapshotRelation.(MergeOnReadSnapshotRelation.scala:65) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63) > ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0] > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > at scala.Option.getOrElse(Option.scala:189) > ~[scala-library-2.12.10.jar:?] > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) > ~[spark-sql_2.12-3.1.1.jar:3.1.1] > {code} > The screenshot shows the files that got created before the write had failed. > > !Screenshot 2021-04-24 at 7.53.22 PM.png! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata
hudi-bot removed a comment on pull request #4523: URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006545249 ## CI report: * 700a87f4f67a1cac8f5b870882ab7b61628b4020 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4940) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata
hudi-bot commented on pull request #4523: URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006610017 ## CI report: * 700a87f4f67a1cac8f5b870882ab7b61628b4020 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4940) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] garystafford opened a new issue #4524: [SUPPORT] Kafka Connect Sink for Hudi README has Incorrect Command
garystafford opened a new issue #4524: URL: https://github.com/apache/hudi/issues/4524 **Describe the problem you faced** In the current instructions for the [Kafka Connect Sink for Hudi](https://github.com/apache/hudi/blob/master/hudi-kafka-connect/README.md), the command, `confluentinc-kafka-connect-hdfs-10.1.0/* /usr/local/share/kafka/plugins/` is incorrect. I believe it should be `cp confluentinc-kafka-connect-hdfs-10.1.0/lib/* /usr/local/share/kafka/plugins/lib`, per Ethan Guo. The current command results in the following error: ``` cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/assets' cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/doc' cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/etc' cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/lib' ``` **To Reproduce** Steps to reproduce the behavior: 1. Enter the command in the README: `cp confluentinc-kafka-connect-hdfs-10.1.0/* /usr/local/share/kafka/plugins/` **Expected behavior** The command works and JARs in the `lib` directory are copied to the appropriate directory. **Environment Description** * Hudi version : N/A * Spark version : N/A * Hive version : N/A * Hadoop version : N/A * Storage (HDFS/S3/GCS..) : N/A * Running on Docker? (yes/no) : no **Additional context** None. **Stacktrace** ``` cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/assets' cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/doc' cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/etc' cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/lib' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table
hudi-bot removed a comment on pull request #4507: URL: https://github.com/apache/hudi/pull/4507#issuecomment-1005353560 ## CI report: * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table
hudi-bot commented on pull request #4507: URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006618506 ## CI report: * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892) * cb52a5afb8fdccd9aadcb50b541a207b1f543886 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table
hudi-bot removed a comment on pull request #4507: URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006618506 ## CI report: * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892) * cb52a5afb8fdccd9aadcb50b541a207b1f543886 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table
hudi-bot commented on pull request #4507: URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006620786 ## CI report: * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892) * cb52a5afb8fdccd9aadcb50b541a207b1f543886 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4941) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a change in pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
codope commented on a change in pull request #4203: URL: https://github.com/apache/hudi/pull/4203#discussion_r779577663 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/TimestampBasedAvroKeyGenerator.java ## @@ -125,7 +126,7 @@ public TimestampBasedAvroKeyGenerator(TypedProperties config) throws IOException @Override public String getPartitionPath(GenericRecord record) { -Object partitionVal = HoodieAvroUtils.getNestedFieldVal(record, getPartitionPathFields().get(0), true); +Object partitionVal = HoodieAvroUtils.getNestedFieldVal(record, getPartitionPathFields().get(0), true, isConsistentLogicalTimestampEnabled()); Review comment: Not changing the keygen API here. I am using the config and class hierarchy itself. `isConsistentLogicalTimestampEnabled()` is defined in `BaseKeyGenerator` superclass for reusability. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #4440: [HUDI-3100] Add config for hive conditional sync
nsivabalan commented on pull request #4440: URL: https://github.com/apache/hudi/pull/4440#issuecomment-1006625969 Ocne you rebase and CI succeeds, I can land this in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #4428: [HUDI-44] Adding support to preserve commit metadata for compaction
nsivabalan commented on pull request #4428: URL: https://github.com/apache/hudi/pull/4428#issuecomment-1006642838 Probably we can skip adding it to plan. here is the use-case. lets say a compaction was triggered w/ preserve commit metadata enabled and mid way users thinks that he does not want preserve commit metadata to be enabled. and so cancels on-going compaction. changes write config to disable preserve commit metadata and restarts. but since we serialized the value to the plan, we will re-execute it from scratch but with preserve commit metadata enabled right ? guess we can't do much. so, better not to serialize the value to the plan. and always honor current write configs. Let me know what do you think -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope merged pull request #4428: [HUDI-44] Adding support to preserve commit metadata for compaction
codope merged pull request #4428: URL: https://github.com/apache/hudi/pull/4428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (50fa5a6 -> b6891d2)
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 50fa5a6 Update HiveIncrementalPuller to configure filesystem (#4431) add b6891d2 [HUDI-44] Adding support to preserve commit metadata for compaction (#4428) No new revisions were added by this update. Summary of changes: .../apache/hudi/config/HoodieCompactionConfig.java | 11 ++ .../org/apache/hudi/config/HoodieWriteConfig.java | 6 +- .../java/org/apache/hudi/io/HoodieMergeHandle.java | 8 ++- .../PartitionAwareClusteringPlanStrategy.java | 2 +- .../TestHoodieClientOnCopyOnWriteStorage.java | 2 +- .../hudi/table/TestHoodieMergeOnReadTable.java | 25 +++--- .../SparkClientFunctionalTestHarness.java | 8 +++ 7 files changed, 51 insertions(+), 11 deletions(-)
[GitHub] [hudi] hudi-bot commented on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table
hudi-bot commented on pull request #4507: URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006659154 ## CI report: * cb52a5afb8fdccd9aadcb50b541a207b1f543886 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4941) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table
hudi-bot removed a comment on pull request #4507: URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006620786 ## CI report: * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892) * cb52a5afb8fdccd9aadcb50b541a207b1f543886 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4941) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] parisni opened a new issue #4525: [SUPPORT] Spark metastore schema evolution broken
parisni opened a new issue #4525: URL: https://github.com/apache/hudi/issues/4525 From my experiments, when a given hudi table gets added columns, then all works except spark read from metastore: - hive read metastore -> New Column added - spark read from hudi path -> New column added - spark read from metastore (spark.table("database.hudi_table"))-> New Column not added I have looked at the hive metastore content, and apparently the columns are store in two tables : - COLUMNS_V2 (one row per column) - TABLE_PARAMS (a key/value table with a spark json schema in it) After hive -sync, only the firt hms table get updated with the new column. The spark json is not updated with the new column. If I purge the table_param table, then magically spark has now the new column in the schema. Then I think the problem is on the spark or hive metastore (not hudi) side, which stores it's columns in an alternative table and don't get modified. But as a result, hudi schema evolution is kind of broken on the spark side. People who read the table from metastore won't see the new columns -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3185) HoodieConfig getBoolean method returns null instead of default value
[ https://issues.apache.org/jira/browse/HUDI-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3185: -- Priority: Blocker (was: Major) > HoodieConfig getBoolean method returns null instead of default value > > > Key: HUDI-3185 > URL: https://issues.apache.org/jira/browse/HUDI-3185 > Project: Apache Hudi > Issue Type: Bug >Reporter: Sagar Sumit >Assignee: Sagar Sumit >Priority: Blocker > Fix For: 0.10.1 > > > If a config has default value then that should be returned instead of null. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-3185) HoodieConfig getBoolean method returns null instead of default value
Sagar Sumit created HUDI-3185: - Summary: HoodieConfig getBoolean method returns null instead of default value Key: HUDI-3185 URL: https://issues.apache.org/jira/browse/HUDI-3185 Project: Apache Hudi Issue Type: Bug Reporter: Sagar Sumit Assignee: Sagar Sumit Fix For: 0.10.1 If a config has default value then that should be returned instead of null. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2429) [UMBRELLA] Comprehensive Schema evolution in Hudi
[ https://issues.apache.org/jira/browse/HUDI-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-2429: - Fix Version/s: 0.12.0 (was: 0.11.0) > [UMBRELLA] Comprehensive Schema evolution in Hudi > - > > Key: HUDI-2429 > URL: https://issues.apache.org/jira/browse/HUDI-2429 > Project: Apache Hudi > Issue Type: Epic > Components: Common Core >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: hudi-umbrellas, pull-request-available > Fix For: 0.12.0 > > > [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution] > > Support comprehensive schema evolution in Hudi > * rename cols > * drop cols > * reorder cols > * re-add cols -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2429) [UMBRELLA] Comprehensive Schema evolution in Hudi
[ https://issues.apache.org/jira/browse/HUDI-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-2429: - Fix Version/s: 0.11.0 (was: 0.12.0) > [UMBRELLA] Comprehensive Schema evolution in Hudi > - > > Key: HUDI-2429 > URL: https://issues.apache.org/jira/browse/HUDI-2429 > Project: Apache Hudi > Issue Type: Epic > Components: Common Core >Reporter: tao meng >Assignee: tao meng >Priority: Major > Labels: hudi-umbrellas, pull-request-available > Fix For: 0.11.0 > > > [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution] > > Support comprehensive schema evolution in Hudi > * rename cols > * drop cols > * reorder cols > * re-add cols -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
[ https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1896: - Fix Version/s: 1.0.0 > [UMBRELLA] Implement DeltaStreamer Source for cloud object stores > - > > Key: HUDI-1896 > URL: https://issues.apache.org/jira/browse/HUDI-1896 > Project: Apache Hudi > Issue Type: Epic > Components: DeltaStreamer >Reporter: Raymond Xu >Assignee: Rajesh Mahindra >Priority: Critical > Labels: hudi-umbrellas, pull-request-available > Fix For: 1.0.0 > > > As discussed in HUDI-1723, we need a better implementation for Cloud object > storage like AWS S3 or GCS, leveraging on change notification. > Also consider > [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html] > > We need to look into current *DFSSource classes and see if we can add a new > `DFSPathSelector` implementation, that fetech new files on cloud storage > after a given point in time. The timestamp based approach used by existing > path selector, largely works, but has corner cases as mentioned in HUDI-1723 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] codope commented on a change in pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
codope commented on a change in pull request #4203: URL: https://github.com/apache/hudi/pull/4203#discussion_r779645642 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestTimestampBasedKeyGenerator.java ## @@ -238,6 +238,40 @@ public void testScalar() throws IOException { assertEquals("2021-04-19", keyGen.getPartitionPath(baseRow)); } + @Test + public void testScalarWithLogicalType() throws IOException { +schema = SchemaTestUtil.getTimestampWithLogicalTypeSchema(); +structType = AvroConversionUtils.convertAvroSchemaToStructType(schema); +baseRecord = SchemaTestUtil.generateAvroRecordFromJson(schema, 1, "001", "f1"); +baseRecord.put("createTime", 163851380600L); + +properties = getBaseKeyConfig("SCALAR", "/MM/dd", "GMT", "MICROSECONDS"); + properties.setProperty(KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED.key(), "true"); +TimestampBasedKeyGenerator keyGen = new TimestampBasedKeyGenerator(properties); +HoodieKey hk1 = keyGen.getKey(baseRecord); +assertEquals("2021/12/03", hk1.getPartitionPath()); + +// test w/ Row +baseRow = genericRecordToRow(baseRecord); +assertEquals("2021/12/03", keyGen.getPartitionPath(baseRow)); +internalRow = KeyGeneratorTestUtilities.getInternalRow(baseRow); +assertEquals("2021/12/03", keyGen.getPartitionPath(internalRow, baseRow.schema())); Review comment: If config is not set then it throws an exception `HoodieKeyGeneratorException: Unable to parse input partition field` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1046) Support updates during clustering in CoW mode
[ https://issues.apache.org/jira/browse/HUDI-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1046: - Priority: Blocker (was: Major) > Support updates during clustering in CoW mode > - > > Key: HUDI-1046 > URL: https://issues.apache.org/jira/browse/HUDI-1046 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: shenh062326 >Priority: Blocker > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1046) Support updates during clustering in CoW mode
[ https://issues.apache.org/jira/browse/HUDI-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1046: - Fix Version/s: 0.12.0 > Support updates during clustering in CoW mode > - > > Key: HUDI-1046 > URL: https://issues.apache.org/jira/browse/HUDI-1046 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: shenh062326 >Priority: Major > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1045) Support updates during clustering in MoR mode
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1045: - Priority: Blocker (was: Major) > Support updates during clustering in MoR mode > - > > Key: HUDI-1045 > URL: https://issues.apache.org/jira/browse/HUDI-1045 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Blocker > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1045) Support updates during clustering in MoR mode
[ https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1045: - Fix Version/s: 0.12.0 > Support updates during clustering in MoR mode > - > > Key: HUDI-1045 > URL: https://issues.apache.org/jira/browse/HUDI-1045 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: leesf >Assignee: leesf >Priority: Blocker > Fix For: 0.12.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] codope merged pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table
codope merged pull request #4507: URL: https://github.com/apache/hudi/pull/4507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated: [HUDI-52] Enabling savepoint and restore for MOR table (#4507)
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 2954027 [HUDI-52] Enabling savepoint and restore for MOR table (#4507) 2954027 is described below commit 2954027b92ada82c41d0a72cc0b837564a730a89 Author: Sivabalan Narayanan AuthorDate: Thu Jan 6 10:56:08 2022 -0500 [HUDI-52] Enabling savepoint and restore for MOR table (#4507) * Enabling restore for MOR table * Fixing savepoint for compaction commits in MOR --- .../hudi/cli/commands/SavepointsCommand.java | 12 ++- .../action/savepoint/SavepointActionExecutor.java | 9 +-- .../TestHoodieSparkMergeOnReadTableRollback.java | 94 ++ 3 files changed, 101 insertions(+), 14 deletions(-) diff --git a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java index 0ea2fff..d3f8584 100644 --- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java +++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java @@ -78,11 +78,9 @@ public class SavepointsCommand implements CommandMarker { throws Exception { HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient(); HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline(); -HoodieTimeline timeline = activeTimeline.getCommitTimeline().filterCompletedInstants(); -HoodieInstant commitInstant = new HoodieInstant(false, HoodieTimeline.COMMIT_ACTION, commitTime); -if (!timeline.containsInstant(commitInstant)) { - return "Commit " + commitTime + " not found in Commits " + timeline; +if (!activeTimeline.getCommitsTimeline().filterCompletedInstants().containsInstant(commitTime)) { + return "Commit " + commitTime + " not found in Commits " + activeTimeline; } SparkLauncher sparkLauncher = SparkUtil.initLauncher(sparkPropertiesPath); @@ -112,10 +110,10 @@ public class SavepointsCommand implements CommandMarker { throw new HoodieException("There are no completed instants to run rollback"); } HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline(); -HoodieTimeline timeline = activeTimeline.getCommitTimeline().filterCompletedInstants(); -HoodieInstant commitInstant = new HoodieInstant(false, HoodieTimeline.COMMIT_ACTION, instantTime); +HoodieTimeline timeline = activeTimeline.getCommitsTimeline().filterCompletedInstants(); +List instants = timeline.getInstants().filter(instant -> instant.getTimestamp().equals(instantTime)).collect(Collectors.toList()); -if (!timeline.containsInstant(commitInstant)) { +if (instants.isEmpty()) { return "Commit " + instantTime + " not found in Commits " + timeline; } diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointActionExecutor.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointActionExecutor.java index de1d973..134b238 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointActionExecutor.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointActionExecutor.java @@ -24,7 +24,6 @@ import org.apache.hudi.common.engine.HoodieEngineContext; import org.apache.hudi.common.fs.FSUtils; import org.apache.hudi.common.model.HoodieBaseFile; import org.apache.hudi.common.model.HoodieRecordPayload; -import org.apache.hudi.common.model.HoodieTableType; import org.apache.hudi.common.table.timeline.HoodieInstant; import org.apache.hudi.common.table.timeline.HoodieTimeline; import org.apache.hudi.common.table.timeline.TimelineMetadataUtils; @@ -65,13 +64,9 @@ public class SavepointActionExecutor ext @Override public HoodieSavepointMetadata execute() { -if (table.getMetaClient().getTableType() == HoodieTableType.MERGE_ON_READ) { - throw new UnsupportedOperationException("Savepointing is not supported or MergeOnRead table types"); -} Option cleanInstant = table.getCompletedCleanTimeline().lastInstant(); -HoodieInstant commitInstant = new HoodieInstant(false, HoodieTimeline.COMMIT_ACTION, instantTime); -if (!table.getCompletedCommitsTimeline().containsInstant(commitInstant)) { - throw new HoodieSavepointException("Could not savepoint non-existing commit " + commitInstant); +if (!table.getCompletedCommitsTimeline().containsInstant(instantTime)) { + throw new HoodieSavepointException("Could not savepoint non-existing commit " + instantTime); } try { diff --git a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java b/hudi-client/hudi-spar
[jira] [Updated] (HUDI-1456) [UMBRELLA] Concurrency Control for Hudi writers and table services
[ https://issues.apache.org/jira/browse/HUDI-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1456: - Summary: [UMBRELLA] Concurrency Control for Hudi writers and table services (was: [UMBRELLA] Concurrent Writing (multiwriter) to Hudi tables) > [UMBRELLA] Concurrency Control for Hudi writers and table services > -- > > Key: HUDI-1456 > URL: https://issues.apache.org/jira/browse/HUDI-1456 > Project: Apache Hudi > Issue Type: Epic > Components: Writer Core >Affects Versions: 0.9.0 >Reporter: Nishith Agarwal >Assignee: Nishith Agarwal >Priority: Major > Labels: hudi-umbrellas > Attachments: image-2020-12-14-09-48-46-946.png > > > This ticket tracks all the changes needed to support concurrency control for > Hudi tables. This work will be done in multiple phases. > # Parallel writing to Hudi tables support -> This feature will allow users > to have multiple writers mutate the tables without the ability to perform > concurrent update to the same file. > # Concurrency control at file/record level -> This feature will allow users > to have multiple writers mutate the tables with the ability to ensure > serializability at record level. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] nsivabalan commented on a change in pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
nsivabalan commented on a change in pull request #4203: URL: https://github.com/apache/hudi/pull/4203#discussion_r779654146 ## File path: hudi-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorOptions.java ## @@ -56,6 +56,13 @@ .withDocumentation("Partition path field. Value to be used at the partitionPath component of HoodieKey. " + "Actual value ontained by invoking .toString()"); + public static final ConfigProperty KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED = ConfigProperty + .key("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled") + .defaultValue("false") + .withDocumentation("When set to true, consistent value will be generated for a logical timestamp type column, " Review comment: Can we add an example here so that users know what to expect if not for enabling this config. example covering both row writer and non-writer path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
hudi-bot commented on pull request #4203: URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006705466 ## CI report: * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835) * 2c0565c35723d6f5fee071d14299361b321f202e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
hudi-bot removed a comment on pull request #4203: URL: https://github.com/apache/hudi/pull/4203#issuecomment-1003406987 ## CI report: * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
hudi-bot commented on pull request #4203: URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006707897 ## CI report: * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835) * 2c0565c35723d6f5fee071d14299361b321f202e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4944) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
hudi-bot removed a comment on pull request #4203: URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006705466 ## CI report: * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835) * 2c0565c35723d6f5fee071d14299361b321f202e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3184) hudi-flink support timestamp-micros
[ https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Well Tang updated HUDI-3184: Remaining Estimate: 120h (was: 5h) Original Estimate: 120h (was: 5h) > hudi-flink support timestamp-micros > --- > > Key: HUDI-3184 > URL: https://issues.apache.org/jira/browse/HUDI-3184 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Well Tang >Assignee: Well Tang >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: 1.png, 2.png, 3.png > > Original Estimate: 120h > Remaining Estimate: 120h > > {*}Problem overview{*}: > Steps to reproduce the behavior: > ①The spark engine is used to write data into the hoodie table(PS: There are > timestamp type columns in the dataset field). > ②Use the Flink engine to read the hoodie table written in step 1. > *Expected behavior* > Caused by: java.lang.IllegalArgumentException: Avro does not support > TIMESTAMP type with precision: 6, it only supports precision less than 3. > at > org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221) > ~... > at > org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263) > ~... > at > org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169) > ~... > at > org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239) > ~... > at > org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155) > ~... > at > org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65) > ~... > *Environment Description* > Hudi version : 0.11.0-SNAPSHOT > Spark version : 3.1.2 > Flink version : 1.13.1 > Hive version : None > Hadoop version : 2.9.2 > Storage (HDFS/S3/GCS..) : HDFS > Running on Docker? (yes/no) : None > *Additional context* > We are using hoodie as a data lake to deliver projects to customers. We found > such application scenarios: write data to the hoodie table through the spark > engine, and then read data from the hoodie table through the finlk engine. > It should be noted that the above exception will be caused by how to write to > the column containing the timestamp in the dataset. > In order to simplify the description of the problem, we summarize the problem > into the following steps: > 【step-1】Mock data: > {code:java} > /home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \ > --driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \ > --master spark://2-120:7077 \ > --executor-memory 4g \ > --driver-memory 4g \ > --num-executors 4 \ > --total-executor-cores 4 \ > --name test \ > --jars > /home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar > \ > --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ > --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \ > --conf spark.sql.hive.convertMetastoreParquet=false {code} > {code:java} > val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt") > df.write.format("hudi"). > option("hoodie.datasource.write.recordkey.field", "id"). > option("hoodie.datasource.write.precombine.field", "id"). > option("hoodie.datasource.write.keygenerator.class", > "org.apache.hudi.keygen.NonpartitionedKeyGenerator"). > option("hoodie.upsert.shuffle.parallelism", "2"). > option("hoodie.table.name", "timestamp_table"). > mode("append"). > save("/hudi/suite/data_type_timestamp_table") > spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false) > {code} > 【step-2】Consumption data through flink: > {code:java} > bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar > {code} > {code:java} > create table data_type_timestamp_table ( > `id` INT, > `name` STRING, > `dt` TIMESTAMP(6) > ) with ( > 'connector' = 'hudi', > 'hoodie.table.name' = 'data_type_timestamp_table', > 'read.streaming.enabled' = 'true', > 'hoodie.datasource.write.recordkey.field' = 'id', > 'path' = '/hudi/suite/data_type_timestamp_table', > 'read.streaming.check-interval' = '10', > 'table.type' = 'COPY_ON_WRITE', > 'write.precombine.field' = 'id' > ); > select * from data_type_timestamp_table; {code} > As shown below: > !1.png! > If we changge timestamp (6) to timestamp (3),the result is as follows: > !2.png! > The data can be found here, but the display is incorrect! > After checking It is found in the Hoodie directory that the spark write > timestamp type is timestamp micros: > !3.png! > However, the timestamp type of hook reading and writing Hoodie data is > timestamp-millis!
[jira] [Updated] (HUDI-2370) Supports data encryption
[ https://issues.apache.org/jira/browse/HUDI-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-2370: - Fix Version/s: 0.11.0 > Supports data encryption > > > Key: HUDI-2370 > URL: https://issues.apache.org/jira/browse/HUDI-2370 > Project: Apache Hudi > Issue Type: New Feature >Reporter: liujinhui >Assignee: liujinhui >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Data security is becoming more and more important, if hudi can support > encryption, it is very welcome > 1. Specify column encryption > 2. Support footer encryption > 3. Custom encrypted client interface(Provide memory-based encryption client > by default) > 4. Specify the encryption key > > When querying, you need to pass the relevant key or obtain query permission > based on the client's encrypted interface. If it fails, the result cannot be > returned. > 1. When querying non-encrypted fields, the key is not passed, and the data > is returned normally > 2. When querying encrypted fields, the key is not passed and the data is not > returned > 3. When the encrypted field is queried, the key is passed, and the data is > returned normally > 4. When querying all fields, the key is not passed and no result is > returned. If passed, the data returns normally > > Start with COW first -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3173) Introduce new INDEX action type
[ https://issues.apache.org/jira/browse/HUDI-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3173: -- Status: In Progress (was: Open) > Introduce new INDEX action type > --- > > Key: HUDI-3173 > URL: https://issues.apache.org/jira/browse/HUDI-3173 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Sagar Sumit >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > Add a top level INDEX action type and supporting methods in HoodieTimeline. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] nsivabalan commented on a change in pull request #4497: [HUDI-3147] Create pushgateway client based on port
nsivabalan commented on a change in pull request #4497: URL: https://github.com/apache/hudi/pull/4497#discussion_r779678339 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/prometheus/PushGatewayReporter.java ## @@ -51,17 +53,30 @@ protected PushGatewayReporter(MetricRegistry registry, TimeUnit rateUnit, TimeUnit durationUnit, String jobName, -String address, +String serverHost, +int serverPort, boolean deleteShutdown) { super(registry, "hudi-push-gateway-reporter", filter, rateUnit, durationUnit); this.jobName = jobName; this.deleteShutdown = deleteShutdown; collectorRegistry = new CollectorRegistry(); metricExports = new DropwizardExports(registry); -pushGateway = new PushGateway(address); +pushGateway = createPushGatewayClient(serverHost, serverPort); metricExports.register(collectorRegistry); } + private PushGateway createPushGatewayClient(String serverHost, int serverPort) { +if (serverPort == 443) { + try { +return new PushGateway(new URL("https://"; + serverHost)); Review comment: @t0il3ts0ap : Did you test this patch? don't we need ``` new URL("https://"; + serverHost + ":" + serverPort)); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1628) [Umbrella] Improve data locality during ingestion
[ https://issues.apache.org/jira/browse/HUDI-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470041#comment-17470041 ] Vinoth Chandar commented on HUDI-1628: -- [~guoyihua] assigning to you to drive this forward. cc [~thirumalai.raj] please let us know if you are still interested in pursuing this. > [Umbrella] Improve data locality during ingestion > - > > Key: HUDI-1628 > URL: https://issues.apache.org/jira/browse/HUDI-1628 > Project: Apache Hudi > Issue Type: Epic > Components: Writer Core >Reporter: satish >Assignee: Ethan Guo >Priority: Major > Labels: hudi-umbrellas > Fix For: 0.11.0 > > > Today the upsert partitioner does the file sizing/bin-packing etc for > inserts and then sends some inserts over to existing file groups to > maintain file size. > We can abstract all of this into strategies and some kind of pipeline > abstractions and have it also consider "affinity" to an existing file group > based > on say information stored in the metadata table? > See http://mail-archives.apache.org/mod_mbox/hudi-dev/202102.mbox/browser > for more details -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-1628) [Umbrella] Improve data locality during ingestion
[ https://issues.apache.org/jira/browse/HUDI-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-1628: Assignee: Ethan Guo (was: Thirumalai Raj R) > [Umbrella] Improve data locality during ingestion > - > > Key: HUDI-1628 > URL: https://issues.apache.org/jira/browse/HUDI-1628 > Project: Apache Hudi > Issue Type: Epic > Components: Writer Core >Reporter: satish >Assignee: Ethan Guo >Priority: Major > Labels: hudi-umbrellas > Fix For: 0.11.0 > > > Today the upsert partitioner does the file sizing/bin-packing etc for > inserts and then sends some inserts over to existing file groups to > maintain file size. > We can abstract all of this into strategies and some kind of pipeline > abstractions and have it also consider "affinity" to an existing file group > based > on say information stored in the metadata table? > See http://mail-archives.apache.org/mod_mbox/hudi-dev/202102.mbox/browser > for more details -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
hudi-bot commented on pull request #4203: URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006747034 ## CI report: * 2c0565c35723d6f5fee071d14299361b321f202e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4944) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
hudi-bot removed a comment on pull request #4203: URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006707897 ## CI report: * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835) * 2c0565c35723d6f5fee071d14299361b321f202e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4944) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator
hudi-bot commented on pull request #4203: URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006751886 ## CI report: * 2c0565c35723d6f5fee071d14299361b321f202e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4944) * eecd338f6aa8c22150cc3a3abc28eb5c2535ef1e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org