date:20220106

[GitHub] [hudi] huangweifeng7 commented on issue #143: Tracking ticket for folks to be added to slack group

2022-01-06 Thread GitBox



huangweifeng7 commented on issue #143:
URL: https://github.com/apache/hudi/issues/143#issuecomment-1006357688


   Please add me to slack group
   Email:huangweifeng_n...@126.com
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4471: [HUDI-3125] spark-sql write timestamp directly

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4471:
URL: https://github.com/apache/hudi/pull/4471#issuecomment-1006360315


   
   ## CI report:
   
   * a5dcf171a39b236a74b9a70b0eb0b49e74ebc3b5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4934)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4471: [HUDI-3125] spark-sql write timestamp directly

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4471:
URL: https://github.com/apache/hudi/pull/4471#issuecomment-1006331887


   
   ## CI report:
   
   * 29b1742747a4195db690d09f09de972ab7f409db Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4895)
 
   * a5dcf171a39b236a74b9a70b0eb0b49e74ebc3b5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4934)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



leesf commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779368601



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import scala.collection.JavaConverters._
+import java.net.URI
+import java.util.{Date, Locale, Properties}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport}
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.DFSPropertiesConfiguration
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, 
HoodieInstantTimeGenerator}
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.sql.{Column, DataFrame, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType}
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, 
Expression, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.sql.types.{DataType, NullType, StringType, 
StructField, StructType}
+
+import java.text.SimpleDateFormat
+
+import scala.collection.immutable.Map
+
+object HoodieSqlCommonUtils extends SparkAdapterSupport {

Review comment:
   yes, the code is moved from HoodieSqlUtils to make more reuse and no new 
methods added. Also I intend to call it as a *Utils class  to keep align with 
HoodieSqlUtils which also extends SparkAdapterSupport.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] leesf commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



leesf commented on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006364350


   > Let me make another pass at all the pom changes. That seems to be main 
thing here. In the meantime, could you clarify these comments?
   > 
   > Also have you tested these changes across spark 2.x and 3.1/3.2 bundles ?
   
   @vinothchandar Yes, I have manually tested it with spark 3.2.0 and spark 
3.1.2 version on spark sql. and the CI tested it on spark 2.4.x and works well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



leesf commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779369728



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import scala.collection.JavaConverters._
+import java.net.URI
+import java.util.{Date, Locale, Properties}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport}
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.DFSPropertiesConfiguration
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, 
HoodieInstantTimeGenerator}
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.sql.{Column, DataFrame, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType}
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, 
Expression, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.sql.types.{DataType, NullType, StringType, 
StructField, StructType}
+
+import java.text.SimpleDateFormat
+
+import scala.collection.immutable.Map
+
+object HoodieSqlCommonUtils extends SparkAdapterSupport {
+  // NOTE: {@code SimpleDataFormat} is NOT thread-safe
+  // TODO replace w/ DateTimeFormatter
+  private val defaultDateFormat =
+  ThreadLocal.withInitial(new java.util.function.Supplier[SimpleDateFormat] {
+override def get() = new SimpleDateFormat("-MM-dd")
+  })
+
+  def isHoodieTable(table: CatalogTable): Boolean = {
+table.provider.map(_.toLowerCase(Locale.ROOT)).orNull == "hudi"
+  }
+
+  def isHoodieTable(tableId: TableIdentifier, spark: SparkSession): Boolean = {
+val table = spark.sessionState.catalog.getTableMetadata(tableId)
+isHoodieTable(table)
+  }
+
+  def isHoodieTable(table: LogicalPlan, spark: SparkSession): Boolean = {
+tripAlias(table) match {
+  case LogicalRelation(_, _, Some(tbl), _) => isHoodieTable(tbl)
+  case relation: UnresolvedRelation =>
+isHoodieTable(sparkAdapter.toTableIdentifier(relation), spark)
+  case _=> false
+}
+  }
+
+  def getTableIdentify(table: LogicalPlan): TableIdentifier = {

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006367031


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931)
 
   * 28d85528277ec6fbe72cd81dd69667495ec58165 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006323042


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006329173


   
   ## CI report:
   
   * 16d5dc61ae5c7a9962fc3756720d8262bdadf6b9 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4932)
 
   * d708467de740637a394375335181979a343979bd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006367102


   
   ## CI report:
   
   * d708467de740637a394375335181979a343979bd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on a change in pull request #2768: [HUDI-485]: corrected the check for incremental sql

2022-01-06 Thread GitBox



codope commented on a change in pull request #2768:
URL: https://github.com/apache/hudi/pull/2768#discussion_r779376837



##
File path: 
hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHiveIncrementalPuller.java
##
@@ -41,4 +67,84 @@ public void testInitHiveIncrementalPuller() {
 
   }
 
+  private HiveIncrementalPuller.Config getHivePullerConfig(String 
incrementalSql) throws IOException {
+config.hiveJDBCUrl = hiveSyncConfig.jdbcUrl;
+config.hiveUsername = hiveSyncConfig.hiveUser;
+config.hivePassword = hiveSyncConfig.hivePass;
+config.hoodieTmpDir = 
Files.createTempDirectory("hivePullerTest").toUri().toString();
+config.sourceDb = hiveSyncConfig.databaseName;
+config.sourceTable = hiveSyncConfig.tableName;
+config.targetDb = "tgtdb";
+config.targetTable = "test2";
+config.tmpDb = "tmp_db";
+config.fromCommitTime = "100";
+createIncrementalSqlFile(incrementalSql, config);
+return config;
+  }
+
+  private void createIncrementalSqlFile(String text, 
HiveIncrementalPuller.Config cfg) throws IOException {
+java.nio.file.Path path = Paths.get(cfg.hoodieTmpDir + 
"/incremental_pull.txt");
+Files.createDirectories(path.getParent());
+Files.createFile(path);
+try (FileWriter fr = new FileWriter(new File(path.toUri( {
+  fr.write(text);
+} catch (Exception e) {
+  // no-op
+}
+cfg.incrementalSQLFile = path.toString();
+  }
+
+  private void createSourceTable() throws IOException, URISyntaxException {
+String instantTime = "101";
+HiveTestUtil.createCOWTable(instantTime, 5, true);
+hiveSyncConfig.syncMode = "jdbc";
+HiveTestUtil.hiveSyncConfig.batchSyncNum = 3;
+HiveSyncTool tool = new HiveSyncTool(hiveSyncConfig, 
HiveTestUtil.getHiveConf(), fileSystem);
+tool.syncHoodieTable();
+  }
+
+  private void createTargetTable() throws IOException, URISyntaxException {
+String instantTime = "100";
+targetBasePath = Files.createTempDirectory("hivesynctest1" + 
Instant.now().toEpochMilli()).toUri().toString();
+HiveTestUtil.createCOWTable(instantTime, 5, true,
+targetBasePath, "tgtdb", "test2");
+HiveSyncTool tool = new 
HiveSyncTool(getTargetHiveSyncConfig(targetBasePath), 
HiveTestUtil.getHiveConf(), fileSystem);
+tool.syncHoodieTable();
+  }
+
+  private HiveSyncConfig getTargetHiveSyncConfig(String basePath) {
+HiveSyncConfig config = HiveSyncConfig.copy(hiveSyncConfig);
+config.databaseName = "tgtdb";
+config.tableName = "test2";
+config.basePath = basePath;
+config.batchSyncNum = 3;
+config.syncMode = "jdbc";
+return config;
+  }
+
+  private void createTables() throws IOException, URISyntaxException {
+createSourceTable();
+createTargetTable();
+  }
+
+  @Test
+  public void testPullerWithoutIncrementalClause() throws IOException, 
URISyntaxException {

Review comment:
   @pratyakshsharma Is the patch ready? If not, can you please update the 
happy flow test case even if it's failing. I can take over and try to fix it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] leesf commented on a change in pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



leesf commented on a change in pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#discussion_r779377180



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlCommonUtils.scala
##
@@ -0,0 +1,316 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import scala.collection.JavaConverters._
+import java.net.URI
+import java.util.{Date, Locale, Properties}
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+import org.apache.hudi.{AvroConversionUtils, SparkAdapterSupport}
+import org.apache.hudi.client.common.HoodieSparkEngineContext
+import org.apache.hudi.common.config.DFSPropertiesConfiguration
+import org.apache.hudi.common.config.HoodieMetadataConfig
+import org.apache.hudi.common.fs.FSUtils
+import org.apache.hudi.common.model.HoodieRecord
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.timeline.{HoodieActiveTimeline, 
HoodieInstantTimeGenerator}
+import org.apache.spark.SPARK_VERSION
+import org.apache.spark.sql.{Column, DataFrame, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
+import org.apache.spark.sql.catalyst.catalog.{CatalogTable, CatalogTableType}
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, Cast, 
Expression, Literal}
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, SubqueryAlias}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}
+import org.apache.spark.api.java.JavaSparkContext
+import org.apache.spark.sql.types.{DataType, NullType, StringType, 
StructField, StructType}
+
+import java.text.SimpleDateFormat
+
+import scala.collection.immutable.Map
+
+object HoodieSqlCommonUtils extends SparkAdapterSupport {
+  // NOTE: {@code SimpleDataFormat} is NOT thread-safe
+  // TODO replace w/ DateTimeFormatter
+  private val defaultDateFormat =
+  ThreadLocal.withInitial(new java.util.function.Supplier[SimpleDateFormat] {
+override def get() = new SimpleDateFormat("-MM-dd")
+  })
+
+  def isHoodieTable(table: CatalogTable): Boolean = {
+table.provider.map(_.toLowerCase(Locale.ROOT)).orNull == "hudi"
+  }
+
+  def isHoodieTable(tableId: TableIdentifier, spark: SparkSession): Boolean = {
+val table = spark.sessionState.catalog.getTableMetadata(tableId)
+isHoodieTable(table)
+  }
+
+  def isHoodieTable(table: LogicalPlan, spark: SparkSession): Boolean = {
+tripAlias(table) match {
+  case LogicalRelation(_, _, Some(tbl), _) => isHoodieTable(tbl)
+  case relation: UnresolvedRelation =>
+isHoodieTable(sparkAdapter.toTableIdentifier(relation), spark)
+  case _=> false
+}
+  }
+
+  def getTableIdentify(table: LogicalPlan): TableIdentifier = {

Review comment:
   done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on pull request #4515: [HUDI-3158] Reduce warn logs in Spark SQL INSERT OVERWRITE

2022-01-06 Thread GitBox



xushiyan commented on pull request #4515:
URL: https://github.com/apache/hudi/pull/4515#issuecomment-1006374242


   @dongkelun the warn log comes from clustering planning. can you help clarify 
how would this change avoid the repeated warn logs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] dongkelun commented on pull request #4515: [HUDI-3158] Reduce warn logs in Spark SQL INSERT OVERWRITE

2022-01-06 Thread GitBox



dongkelun commented on pull request #4515:
URL: https://github.com/apache/hudi/pull/4515#issuecomment-1006384615


   > @dongkelun the warn log comes from clustering planning. can you help 
clarify how would this change avoid the repeated warn logs?
   
   Hello, the reason for the warning is that the content of 
replaceCommitRequestedInstantinstant is empty,There are many places to call 
this method, such as `HoodieSparkTable.create`:
   
   ```scala
   if (refreshTimeline) {
 hoodieSparkTable.getHoodieView().sync();
   }
   ```
   There are also many places to call  `HoodieSparkTable.create`,Therefore, it 
is not easy to reduce the warning log. It is better to avoid this warning 
directly from the source.`INSERT_OVERWRITE`'s commitActionType is 
`REPLACE_COMMIT_ACTION`,It creates an empty 
`replaceCommitRequestedInstantinstant `  in the `startCommitWithTime` method,We 
can avoid this warning from the source by changing it to non empty
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006386632


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931)
 
   * 28d85528277ec6fbe72cd81dd69667495ec58165 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006367031


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931)
 
   * 28d85528277ec6fbe72cd81dd69667495ec58165 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006153120


   
   ## CI report:
   
   * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN
   * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006400478


   
   ## CI report:
   
   * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN
   * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915)
 
   * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4516:
URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006131415


   
   ## CI report:
   
   * 97502fa31dda3b94645631303e134bf0d652c17e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4516:
URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006400740


   
   ## CI report:
   
   * 97502fa31dda3b94645631303e134bf0d652c17e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913)
 
   * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4516:
URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006400740


   
   ## CI report:
   
   * 97502fa31dda3b94645631303e134bf0d652c17e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913)
 
   * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4516:
URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006402795


   
   ## CI report:
   
   * 97502fa31dda3b94645631303e134bf0d652c17e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913)
 
   * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN
   * 280360f772b47ffab15655e4679b021e151783d7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4516:
URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006402795


   
   ## CI report:
   
   * 97502fa31dda3b94645631303e134bf0d652c17e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913)
 
   * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN
   * 280360f772b47ffab15655e4679b021e151783d7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4516:
URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006404803


   
   ## CI report:
   
   * 97502fa31dda3b94645631303e134bf0d652c17e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913)
 
   * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN
   * 280360f772b47ffab15655e4679b021e151783d7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4937)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006428061


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * 28d85528277ec6fbe72cd81dd69667495ec58165 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4514: [HUDI-3172] Refactor hudi existing modules to make more code reuse in V2 Implementation

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4514:
URL: https://github.com/apache/hudi/pull/4514#issuecomment-1006386632


   
   ## CI report:
   
   * ddc3af0c32bafef6b10c32c43132df32a5f7d83c UNKNOWN
   * e1ba726105dfa7ae07d802546c71a0cf1ad8b172 UNKNOWN
   * 306e7d462959e0249e230f60c2e9ea6602342e08 UNKNOWN
   * 15122772d9430d91807053555e12afaeda30e688 UNKNOWN
   * ac8d014a0602e3c499771f3313f0f88de57cdda1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4931)
 
   * 28d85528277ec6fbe72cd81dd69667495ec58165 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4936)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006435512


   
   ## CI report:
   
   * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN
   * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915)
 
   * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4938)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006400478


   
   ## CI report:
   
   * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN
   * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915)
 
   * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] AirToSupply opened a new issue #4522: [SUPPORT] hudi-flink support timestamp-micros

2022-01-06 Thread GitBox



AirToSupply opened a new issue #4522:
URL: https://github.com/apache/hudi/issues/4522


   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. The spark engine is used to write data into the hoodie table（PS: There 
are timestamp type columns in the dataset field）.
   2. Use the Flink engine to read the hoodie table written in step 1.
   
   **Expected behavior**
   
   Caused by: java.lang.IllegalArgumentException: Avro does not support 
TIMESTAMP type with precision: 6, it only supports precision less than 3.
 at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221)
 ~...
 at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263)
 ~...
 at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169)
 ~...
 at 
org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239)
 ~...
 at 
org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155)
 ~...
 at 
org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65)
 ~...
   
   **Environment Description**
   
   * Hudi version : 0.11.0-SNAPSHOT
   
   * Spark version : 3.1.2
   
   * Flink version : 1.13.1
   
   * Hive version : None
   
   * Hadoop version : 2.9.2
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : None
   
   
   **Additional context**
   
   We are using hoodie as a data lake to deliver projects to customers. We 
found such application scenarios:  write data to the hoodie table through the 
spark engine, and then read data from the hoodie table through the finlk 
engine. 
   It should be noted that the above exception will be caused by how to write 
to the column containing the timestamp in the dataset.
   In order to simplify the description of the problem, we summarize the 
problem into the following steps:
   【step-1】Mock data: 
   ```shell
   /home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \
   --driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \
   --master spark://2-120:7077 \
   --executor-memory 4g \
   --driver-memory 4g \
   --num-executors 4 \
   --total-executor-cores 4 \
   --name test \
   --jars 
/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar
 \
   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
   --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \
   --conf spark.sql.hive.convertMetastoreParquet=false 
   ```
   
   ```sql
   val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt")
   
   df.write.format("hudi").
 option("hoodie.datasource.write.recordkey.field", "id").
 option("hoodie.datasource.write.precombine.field", "id").
 option("hoodie.datasource.write.keygenerator.class", 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator").
 option("hoodie.upsert.shuffle.parallelism", "2").
 option("hoodie.table.name", "timestamp_table").
 mode("append").
 save("/hudi/suite/data_type_timestamp_table")
   
   
spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false)
   ```
   
   【step-2】Consumption data through flink：
   ```shell
   bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar
   ```
   
   ```sql
   create table data_type_timestamp_table (
 `id` INT,
 `name` STRING,
 `dt` TIMESTAMP(6)
   ) with (
 'connector' = 'hudi',
 'hoodie.table.name' = 'data_type_timestamp_table',
 'read.streaming.enabled' = 'true',
 'hoodie.datasource.write.recordkey.field' = 'id',
 'path' = '/hudi/suite/data_type_timestamp_table',
 'read.streaming.check-interval' = '10',
 'table.type' = 'COPY_ON_WRITE',
 'write.precombine.field' = 'id'
   );
   
   select * from data_type_timestamp_table;
   ```
   
   As shown below：
   
![lQLPDhrvUNen-1rNAb7NBCOwXsod7xqeE9YBtVZ528ASAA_1059_446](https://user-images.githubusercontent.com/62897740/148364869-dc82d0ef-d766-4f7a-a274-ab04ab59ca78.png)
   
   If we changge timestamp (6) to timestamp (3)，the result is as follows：
   
![lQLPDhrw7wIvugRazQOXsGhFVZXLzk8BAbf9C8FANwA_919_90](https://user-images.githubusercontent.com/62897740/148365337-5e38c559-e3cf-4b7d-b747-1d1b92ca7798.png)
   
   The data can be found here, but the display is incorrect！
   
   After checking It is found in the Hoodie directory that the spark write 
timestamp type is timestamp micros：
   
![lQLPDhrw76Mec4_NAwTNBjiwXbL91rUPs-8Bt_4Se8B-AA_1592_772](https://user-images.githubusercontent.com/62897740/148365863-0f6659b1-1e70-4931-848c-9eeb2b41c01b.png)
   
   However, the timestamp type of hook reading and writing Hoodie data is 
timestamp-millis！Therefore, it is problematic for us to read and write 
timestamp types through Spark and Flink computing engines. We hope that 
hudi-flink module needs t

[GitHub] [hudi] zhangyue19921010 commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-06 Thread GitBox



zhangyue19921010 commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006441474


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (HUDI-3184) hudi-flink support timestamp-micros

2022-01-06 Thread Well Tang (Jira)

Well Tang created HUDI-3184:
---

 Summary: hudi-flink support timestamp-micros
 Key: HUDI-3184
 URL: https://issues.apache.org/jira/browse/HUDI-3184
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Well Tang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3184) hudi-flink support timestamp-micros

2022-01-06 Thread Well Tang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Well Tang updated HUDI-3184:

Fix Version/s: 0.11.0

> hudi-flink support timestamp-micros
> ---
>
> Key: HUDI-3184
> URL: https://issues.apache.org/jira/browse/HUDI-3184
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Well Tang
>Priority: Major
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006443490


   
   ## CI report:
   
   * d708467de740637a394375335181979a343979bd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4939)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006367102


   
   ## CI report:
   
   * d708467de740637a394375335181979a343979bd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (HUDI-2779) Cache BaseDir if HudiTableNotFound Exception thrown

2022-01-06 Thread Hui An (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An reassigned HUDI-2779:


Assignee: Hui An

> Cache BaseDir if HudiTableNotFound Exception thrown
> ---
>
> Key: HUDI-2779
> URL: https://issues.apache.org/jira/browse/HUDI-2779
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (HUDI-3162) Shade AWS dependencies for bundled packages

2022-01-06 Thread Hui An (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui An closed HUDI-3162.

Resolution: Duplicate

> Shade AWS dependencies for bundled packages
> ---
>
> Key: HUDI-3162
> URL: https://issues.apache.org/jira/browse/HUDI-3162
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-3184) hudi-flink support timestamp-micros

2022-01-06 Thread Well Tang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Well Tang reassigned HUDI-3184:
---

Attachment: 3.png
2.png
1.png
  Assignee: Well Tang
   Description: 
{*}Problem overview{*}：

Steps to reproduce the behavior:

①The spark engine is used to write data into the hoodie table（PS: There are 
timestamp type columns in the dataset field）.

②Use the Flink engine to read the hoodie table written in step 1.

 

*Expected behavior*

Caused by: java.lang.IllegalArgumentException: Avro does not support TIMESTAMP 
type with precision: 6, it only supports precision less than 3.
at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221)
 ~...
at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263)
 ~...
at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169)
 ~...
at 
org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239)
 ~...
at 
org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155)
 ~...
at 
org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65)
 ~...

 

*Environment Description*

  Hudi version : 0.11.0-SNAPSHOT

  Spark version : 3.1.2

  Flink version : 1.13.1

  Hive version : None

  Hadoop version : 2.9.2

  Storage (HDFS/S3/GCS..) : HDFS

  Running on Docker? (yes/no) : None

 

*Additional context*

We are using hoodie as a data lake to deliver projects to customers. We found 
such application scenarios: write data to the hoodie table through the spark 
engine, and then read data from the hoodie table through the finlk engine.


It should be noted that the above exception will be caused by how to write to 
the column containing the timestamp in the dataset.


In order to simplify the description of the problem, we summarize the problem 
into the following steps:

【step-1】Mock data:
{code:java}
/home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \
--driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \
--master spark://2-120:7077 \
--executor-memory 4g \
--driver-memory 4g \
--num-executors 4 \
--total-executor-cores 4 \
--name test \
--jars 
/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar
 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \
--conf spark.sql.hive.convertMetastoreParquet=false {code}
 
{code:java}
val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt")

df.write.format("hudi").
  option("hoodie.datasource.write.recordkey.field", "id").
  option("hoodie.datasource.write.precombine.field", "id").
  option("hoodie.datasource.write.keygenerator.class", 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator").
  option("hoodie.upsert.shuffle.parallelism", "2").
  option("hoodie.table.name", "timestamp_table").
  mode("append").
  save("/hudi/suite/data_type_timestamp_table")

spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false)
 {code}
 

 

【step-2】Consumption data through flink：

 
{code:java}
bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar 
{code}
 

 
{code:java}
create table data_type_timestamp_table (
  `id` INT,
  `name` STRING,
  `dt` TIMESTAMP(6)
) with (
  'connector' = 'hudi',
  'hoodie.table.name' = 'data_type_timestamp_table',
  'read.streaming.enabled' = 'true',
  'hoodie.datasource.write.recordkey.field' = 'id',
  'path' = '/hudi/suite/data_type_timestamp_table',
  'read.streaming.check-interval' = '10',
  'table.type' = 'COPY_ON_WRITE',
  'write.precombine.field' = 'id'
);

select * from data_type_timestamp_table; {code}
As shown below：

!1.png!

If we changge timestamp (6) to timestamp (3)，the result is as follows：

!2.png!

The data can be found here, but the display is incorrect！

After checking It is found in the Hoodie directory that the spark write 
timestamp type is timestamp micros：

!3.png!

 

However, the timestamp type of hook reading and writing Hoodie data is 
timestamp-millis！Therefore, it is problematic for us to read and write 
timestamp types through Spark and Flink computing engines. We hope that 
hudi-flink module needs to support timestamp micros and cannot lose time 
accuracy.

 
Labels: pull-request-available  (was: )
Remaining Estimate: 120h
 Original Estimate: 120h

> hudi-flink support timestamp-micros
> ---
>
> Key: HUDI-3184
> URL: https://issues.apache.org/jira/browse/HUDI-3184
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Re

[jira] [Updated] (HUDI-3184) hudi-flink support timestamp-micros

2022-01-06 Thread Well Tang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Well Tang updated HUDI-3184:

   Description: 
{*}Problem overview{*}：

Steps to reproduce the behavior:

①The spark engine is used to write data into the hoodie table（PS: There are 
timestamp type columns in the dataset field）.

②Use the Flink engine to read the hoodie table written in step 1.

*Expected behavior*

Caused by: java.lang.IllegalArgumentException: Avro does not support TIMESTAMP 
type with precision: 6, it only supports precision less than 3.
at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221)
 ~...
at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263)
 ~...
at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169)
 ~...
at 
org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239)
 ~...
at 
org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155)
 ~...
at 
org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65)
 ~...

*Environment Description*

  Hudi version : 0.11.0-SNAPSHOT

  Spark version : 3.1.2

  Flink version : 1.13.1

  Hive version : None

  Hadoop version : 2.9.2

  Storage (HDFS/S3/GCS..) : HDFS

  Running on Docker? (yes/no) : None

*Additional context*

We are using hoodie as a data lake to deliver projects to customers. We found 
such application scenarios: write data to the hoodie table through the spark 
engine, and then read data from the hoodie table through the finlk engine.

It should be noted that the above exception will be caused by how to write to 
the column containing the timestamp in the dataset.

In order to simplify the description of the problem, we summarize the problem 
into the following steps:

【step-1】Mock data:
{code:java}
/home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \
--driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \
--master spark://2-120:7077 \
--executor-memory 4g \
--driver-memory 4g \
--num-executors 4 \
--total-executor-cores 4 \
--name test \
--jars 
/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar
 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \
--conf spark.sql.hive.convertMetastoreParquet=false {code}
{code:java}
val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt")

df.write.format("hudi").
  option("hoodie.datasource.write.recordkey.field", "id").
  option("hoodie.datasource.write.precombine.field", "id").
  option("hoodie.datasource.write.keygenerator.class", 
"org.apache.hudi.keygen.NonpartitionedKeyGenerator").
  option("hoodie.upsert.shuffle.parallelism", "2").
  option("hoodie.table.name", "timestamp_table").
  mode("append").
  save("/hudi/suite/data_type_timestamp_table")

spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false)
 {code}
【step-2】Consumption data through flink：
{code:java}
bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar 
{code}
{code:java}
create table data_type_timestamp_table (
  `id` INT,
  `name` STRING,
  `dt` TIMESTAMP(6)
) with (
  'connector' = 'hudi',
  'hoodie.table.name' = 'data_type_timestamp_table',
  'read.streaming.enabled' = 'true',
  'hoodie.datasource.write.recordkey.field' = 'id',
  'path' = '/hudi/suite/data_type_timestamp_table',
  'read.streaming.check-interval' = '10',
  'table.type' = 'COPY_ON_WRITE',
  'write.precombine.field' = 'id'
);

select * from data_type_timestamp_table; {code}
As shown below：

!1.png!

If we changge timestamp (6) to timestamp (3)，the result is as follows：

!2.png!

The data can be found here, but the display is incorrect！

After checking It is found in the Hoodie directory that the spark write 
timestamp type is timestamp micros：

!3.png!

However, the timestamp type of hook reading and writing Hoodie data is 
timestamp-millis！Therefore, it is problematic for us to read and write 
timestamp types through Spark and Flink computing engines. We hope that 
hudi-flink module needs to support timestamp micros and cannot lose time 
accuracy.

  was:
{*}Problem overview{*}：

Steps to reproduce the behavior:

①The spark engine is used to write data into the hoodie table（PS: There are 
timestamp type columns in the dataset field）.

②Use the Flink engine to read the hoodie table written in step 1.

 

*Expected behavior*

Caused by: java.lang.IllegalArgumentException: Avro does not support TIMESTAMP 
type with precision: 6, it only supports precision less than 3.
at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221)
 ~...
at 
org.apache.hudi.util.AvroSchemaConverter.convertToSchema(Avr

[jira] [Updated] (HUDI-3184) hudi-flink support timestamp-micros

2022-01-06 Thread Well Tang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Well Tang updated HUDI-3184:

Status: In Progress  (was: Open)

> hudi-flink support timestamp-micros
> ---
>
> Key: HUDI-3184
> URL: https://issues.apache.org/jira/browse/HUDI-3184
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Well Tang
>Assignee: Well Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: 1.png, 2.png, 3.png
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> {*}Problem overview{*}：
> Steps to reproduce the behavior:
> ①The spark engine is used to write data into the hoodie table（PS: There are 
> timestamp type columns in the dataset field）.
> ②Use the Flink engine to read the hoodie table written in step 1.
> *Expected behavior*
> Caused by: java.lang.IllegalArgumentException: Avro does not support 
> TIMESTAMP type with precision: 6, it only supports precision less than 3.
> at 
> org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221)
>  ~...
> at 
> org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263)
>  ~...
> at 
> org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169)
>  ~...
> at 
> org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239)
>  ~...
> at 
> org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155)
>  ~...
> at 
> org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65)
>  ~...
> *Environment Description*
>   Hudi version : 0.11.0-SNAPSHOT
>   Spark version : 3.1.2
>   Flink version : 1.13.1
>   Hive version : None
>   Hadoop version : 2.9.2
>   Storage (HDFS/S3/GCS..) : HDFS
>   Running on Docker? (yes/no) : None
> *Additional context*
> We are using hoodie as a data lake to deliver projects to customers. We found 
> such application scenarios: write data to the hoodie table through the spark 
> engine, and then read data from the hoodie table through the finlk engine.
> It should be noted that the above exception will be caused by how to write to 
> the column containing the timestamp in the dataset.
> In order to simplify the description of the problem, we summarize the problem 
> into the following steps:
> 【step-1】Mock data:
> {code:java}
> /home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \
> --driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \
> --master spark://2-120:7077 \
> --executor-memory 4g \
> --driver-memory 4g \
> --num-executors 4 \
> --total-executor-cores 4 \
> --name test \
> --jars 
> /home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar
>  \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
> --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \
> --conf spark.sql.hive.convertMetastoreParquet=false {code}
> {code:java}
> val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt")
> df.write.format("hudi").
>   option("hoodie.datasource.write.recordkey.field", "id").
>   option("hoodie.datasource.write.precombine.field", "id").
>   option("hoodie.datasource.write.keygenerator.class", 
> "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
>   option("hoodie.upsert.shuffle.parallelism", "2").
>   option("hoodie.table.name", "timestamp_table").
>   mode("append").
>   save("/hudi/suite/data_type_timestamp_table")
> spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false)
>  {code}
> 【step-2】Consumption data through flink：
> {code:java}
> bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar 
> {code}
> {code:java}
> create table data_type_timestamp_table (
>   `id` INT,
>   `name` STRING,
>   `dt` TIMESTAMP(6)
> ) with (
>   'connector' = 'hudi',
>   'hoodie.table.name' = 'data_type_timestamp_table',
>   'read.streaming.enabled' = 'true',
>   'hoodie.datasource.write.recordkey.field' = 'id',
>   'path' = '/hudi/suite/data_type_timestamp_table',
>   'read.streaming.check-interval' = '10',
>   'table.type' = 'COPY_ON_WRITE',
>   'write.precombine.field' = 'id'
> );
> select * from data_type_timestamp_table; {code}
> As shown below：
> !1.png!
> If we changge timestamp (6) to timestamp (3)，the result is as follows：
> !2.png!
> The data can be found here, but the display is incorrect！
> After checking It is found in the Hoodie directory that the spark write 
> timestamp type is timestamp micros：
> !3.png!
> However, the timestamp type of hook reading and writing Hoodie data is 
> timestamp-millis！Therefore, it is problematic for us to read and

[GitHub] [hudi] zhangyue19921010 commented on pull request #1274: [HUDI-571] Add 'commits show archived' command to CLI

2022-01-06 Thread GitBox



zhangyue19921010 commented on pull request #1274:
URL: https://github.com/apache/hudi/pull/1274#issuecomment-1006465368


   Hi guys, it seems that there 's a little problem with the regex pattern `  
private static final Pattern ARCHIVE_FILE_PATTERN =
 Pattern.compile("^\\.commits_\\.archive\\.([0-9]*)$");` 
 
   Just raise a PR https://github.com/apache/hudi/pull/4521 trying to fix it. 
Wish you're interested and help me review? Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] zhangyue19921010 closed pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-06 Thread GitBox



zhangyue19921010 closed pull request #4521:
URL: https://github.com/apache/hudi/pull/4521


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] AirToSupply commented on issue #4522: [SUPPORT] hudi-flink support timestamp-micros

2022-01-06 Thread GitBox



AirToSupply commented on issue #4522:
URL: https://github.com/apache/hudi/issues/4522#issuecomment-1006466436


   @AirToSupply Thanks, https://issues.apache.org/jira/browse/HUDI-3184 issue 
created here ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4516:
URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006470119


   
   ## CI report:
   
   * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN
   * 280360f772b47ffab15655e4679b021e151783d7 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4937)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4516: [WIP][HUDI-1295] Enabling metadata table based index by default for tests

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4516:
URL: https://github.com/apache/hudi/pull/4516#issuecomment-1006404803


   
   ## CI report:
   
   * 97502fa31dda3b94645631303e134bf0d652c17e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4913)
 
   * 7e2ec46af829fabeb506d639c54057d32f3c89fa UNKNOWN
   * 280360f772b47ffab15655e4679b021e151783d7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4937)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006484026


   
   ## CI report:
   
   * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN
   * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4938)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#issuecomment-1006435512


   
   ## CI report:
   
   * 235981abd20a498a3e29e98ce0eda9de35018f99 UNKNOWN
   * 486c6886c5b0bd748e3db1c90c886a1b7f6d52e8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4915)
 
   * ce1b2b4eefdd2e0d46154b2c97dc93abf6982aa0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4938)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006443490


   
   ## CI report:
   
   * d708467de740637a394375335181979a343979bd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4939)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4521: [HUDI-3183] Wrong result of HoodieArchivedTimeline loadInstants with TimeRangeFilter

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4521:
URL: https://github.com/apache/hudi/pull/4521#issuecomment-1006493903


   
   ## CI report:
   
   * d708467de740637a394375335181979a343979bd Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4933)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4939)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] liujinhui1994 closed issue #4027: [SUPPORT] Structured streaming Async clustering IndexOutOfBoundsException

2022-01-06 Thread GitBox



liujinhui1994 closed issue #4027:
URL: https://github.com/apache/hudi/issues/4027


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope opened a new pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata

2022-01-06 Thread GitBox



codope opened a new pull request #4523:
URL: https://github.com/apache/hudi/pull/4523


   ## What is the purpose of the pull request
   
   - Add top level INIDEX action type.
   - Add supporting methods in HoodieTimeline.
   - Add index commit metadata which contains index plan.
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3173) Introduce new INDEX action type

2022-01-06 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3173:
-
Labels: pull-request-available  (was: )

> Introduce new INDEX action type
> ---
>
> Key: HUDI-3173
> URL: https://issues.apache.org/jira/browse/HUDI-3173
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Add a top level INDEX action type and supporting methods in HoodieTimeline.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4523:
URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006542672


   
   ## CI report:
   
   * 700a87f4f67a1cac8f5b870882ab7b61628b4020 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4523:
URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006542672


   
   ## CI report:
   
   * 700a87f4f67a1cac8f5b870882ab7b61628b4020 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4523:
URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006545249


   
   ## CI report:
   
   * 700a87f4f67a1cac8f5b870882ab7b61628b4020 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4940)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



codope commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006547783


   > @nsivabalan @codope I have a discussion related to this implement. In this 
pr, most of work is just to pass `isConsistentLogicalTimestampEnabled` to the 
method `HoodieAvroUtils.convertValueForAvroLogicalTypes`. What if we have 
another config need to do this in the future?
   
   @YannByron You bring up a good point. Adding another config in future is 
tedious. However, the intention behind adding a new config was to avoid 
discrepancy in existing pipelines. @nsivabalan has explained this in more 
detail on the jira HUDI-2909. I do not expect such changes to be frequent. 
Nevertheless, i'll try to avoid making incompatible changes to public APIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nochimow commented on issue #4299: [SUPPORT] Upsert performance decreased after 3 years of data loading

2022-01-06 Thread GitBox



nochimow commented on issue #4299:
URL: https://github.com/apache/hudi/issues/4299#issuecomment-1006558081


   Hi,
   Still waiting for some updates on this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #4440: [HUDI-3100] Add config for hive conditional sync

2022-01-06 Thread GitBox



nsivabalan commented on pull request #4440:
URL: https://github.com/apache/hudi/pull/4440#issuecomment-1006567579


   I am ok adding it. I am seeing this as, filling in a gap we had previously. 
I understand, it is debatable whether to consider as bug fix or not. but I 
feel, we can add it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1850) Read on table fails if the first write to table failed

2022-01-06 Thread sivabalan narayanan (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-1850:
--
Sprint: Hudi-Sprint-Jan-3  (was: Hudi 0.10.1 -  2021/01/03)

> Read on table fails if the first write to table failed
> --
>
> Key: HUDI-1850
> URL: https://issues.apache.org/jira/browse/HUDI-1850
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Spark Integration
>Affects Versions: 0.8.0
>Reporter: Vaibhav Sinha
>Priority: Major
>  Labels: core-flow-ds, pull-request-available, release-blocker, 
> sev:high, spark
> Fix For: 0.11.0, 0.10.1
>
> Attachments: Screenshot 2021-04-24 at 7.53.22 PM.png
>
>
> {code:java}
> ava.util.NoSuchElementException: No value present in Option
>   at org.apache.hudi.common.util.Option.get(Option.java:88) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableSchemaFromCommitMetadata(TableSchemaResolver.java:215)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:166)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchema(TableSchemaResolver.java:155)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.hudi.MergeOnReadSnapshotRelation.(MergeOnReadSnapshotRelation.scala:65)
>  ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:99) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:63) 
> ~[hudi-spark3-bundle_2.12-0.8.0.jar:0.8.0]
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
>  ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at scala.Option.getOrElse(Option.scala:189) 
> ~[scala-library-2.12.10.jar:?]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) 
> ~[spark-sql_2.12-3.1.1.jar:3.1.1]
> {code}
> The screenshot shows the files that got created before the write had failed.
>  
> !Screenshot 2021-04-24 at 7.53.22 PM.png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot removed a comment on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4523:
URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006545249


   
   ## CI report:
   
   * 700a87f4f67a1cac8f5b870882ab7b61628b4020 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4940)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4523:
URL: https://github.com/apache/hudi/pull/4523#issuecomment-1006610017


   
   ## CI report:
   
   * 700a87f4f67a1cac8f5b870882ab7b61628b4020 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4940)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] garystafford opened a new issue #4524: [SUPPORT] Kafka Connect Sink for Hudi README has Incorrect Command

2022-01-06 Thread GitBox



garystafford opened a new issue #4524:
URL: https://github.com/apache/hudi/issues/4524


   **Describe the problem you faced**
   
   In the current instructions for the [Kafka Connect Sink for 
Hudi](https://github.com/apache/hudi/blob/master/hudi-kafka-connect/README.md), 
the command, `confluentinc-kafka-connect-hdfs-10.1.0/* 
/usr/local/share/kafka/plugins/` is incorrect. I believe it should be `cp 
confluentinc-kafka-connect-hdfs-10.1.0/lib/* 
/usr/local/share/kafka/plugins/lib`, per Ethan Guo. The current command results 
in the following error:
   
   ```
   cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/assets'
   cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/doc'
   cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/etc'
   cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/lib'
   ```
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Enter the command in the README: `cp 
confluentinc-kafka-connect-hdfs-10.1.0/* /usr/local/share/kafka/plugins/`
   
   **Expected behavior**
   
   The command works and JARs in the `lib` directory are copied to the 
appropriate directory.
   
   **Environment Description**
   
   * Hudi version : N/A
   
   * Spark version : N/A
   
   * Hive version : N/A
   
   * Hadoop version : N/A
   
   * Storage (HDFS/S3/GCS..) : N/A
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   None.
   
   **Stacktrace**
   
   ```
   cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/assets'
   cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/doc'
   cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/etc'
   cp: omitting directory 'confluentinc-kafka-connect-hdfs-10.1.0/lib'
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4507:
URL: https://github.com/apache/hudi/pull/4507#issuecomment-1005353560


   
   ## CI report:
   
   * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4507:
URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006618506


   
   ## CI report:
   
   * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892)
 
   * cb52a5afb8fdccd9aadcb50b541a207b1f543886 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4507:
URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006618506


   
   ## CI report:
   
   * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892)
 
   * cb52a5afb8fdccd9aadcb50b541a207b1f543886 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4507:
URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006620786


   
   ## CI report:
   
   * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892)
 
   * cb52a5afb8fdccd9aadcb50b541a207b1f543886 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4941)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope commented on a change in pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



codope commented on a change in pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#discussion_r779577663



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/TimestampBasedAvroKeyGenerator.java
##
@@ -125,7 +126,7 @@ public TimestampBasedAvroKeyGenerator(TypedProperties 
config) throws IOException
 
   @Override
   public String getPartitionPath(GenericRecord record) {
-Object partitionVal = HoodieAvroUtils.getNestedFieldVal(record, 
getPartitionPathFields().get(0), true);
+Object partitionVal = HoodieAvroUtils.getNestedFieldVal(record, 
getPartitionPathFields().get(0), true, isConsistentLogicalTimestampEnabled());

Review comment:
   Not changing the keygen API here. I am using the config and class 
hierarchy itself. `isConsistentLogicalTimestampEnabled()` is defined in 
`BaseKeyGenerator` superclass for reusability.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #4440: [HUDI-3100] Add config for hive conditional sync

2022-01-06 Thread GitBox



nsivabalan commented on pull request #4440:
URL: https://github.com/apache/hudi/pull/4440#issuecomment-1006625969


   Ocne you rebase and CI succeeds, I can land this in


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] nsivabalan commented on pull request #4428: [HUDI-44] Adding support to preserve commit metadata for compaction

2022-01-06 Thread GitBox



nsivabalan commented on pull request #4428:
URL: https://github.com/apache/hudi/pull/4428#issuecomment-1006642838


   Probably we can skip adding it to plan. here is the use-case. 
   lets say a compaction was triggered w/ preserve commit metadata enabled and 
mid way users thinks that he does not want preserve commit metadata to be 
enabled.
   and so cancels on-going compaction. changes write config to disable preserve 
commit metadata and restarts.
   but since we serialized the value to the plan, we will re-execute it from 
scratch but with preserve commit metadata enabled right ?
   guess we can't do much. 
   so, better not to serialize the value to the plan. and always honor current 
write configs. 
   Let me know what do you think


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] codope merged pull request #4428: [HUDI-44] Adding support to preserve commit metadata for compaction

2022-01-06 Thread GitBox



codope merged pull request #4428:
URL: https://github.com/apache/hudi/pull/4428


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated (50fa5a6 -> b6891d2)

2022-01-06 Thread codope

This is an automated email from the ASF dual-hosted git repository.

codope pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 50fa5a6  Update HiveIncrementalPuller to configure filesystem (#4431)
 add b6891d2  [HUDI-44] Adding support to preserve commit metadata for 
compaction (#4428)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/config/HoodieCompactionConfig.java | 11 ++
 .../org/apache/hudi/config/HoodieWriteConfig.java  |  6 +-
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  8 ++-
 .../PartitionAwareClusteringPlanStrategy.java  |  2 +-
 .../TestHoodieClientOnCopyOnWriteStorage.java  |  2 +-
 .../hudi/table/TestHoodieMergeOnReadTable.java | 25 +++---
 .../SparkClientFunctionalTestHarness.java  |  8 +++
 7 files changed, 51 insertions(+), 11 deletions(-)

[GitHub] [hudi] hudi-bot commented on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4507:
URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006659154


   
   ## CI report:
   
   * cb52a5afb8fdccd9aadcb50b541a207b1f543886 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4941)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4507:
URL: https://github.com/apache/hudi/pull/4507#issuecomment-1006620786


   
   ## CI report:
   
   * 2968b1793b9b3e339f3a5267984269e02bdf6c83 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4892)
 
   * cb52a5afb8fdccd9aadcb50b541a207b1f543886 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4941)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] parisni opened a new issue #4525: [SUPPORT] Spark metastore schema evolution broken

2022-01-06 Thread GitBox



parisni opened a new issue #4525:
URL: https://github.com/apache/hudi/issues/4525


   From my experiments, when a given hudi table gets added columns, then all 
works except spark read from metastore:
   
   - hive read metastore -> New Column added
   - spark read from hudi path -> New column added
   - spark read from metastore (spark.table("database.hudi_table"))-> New 
Column not added
   
   I have looked at the hive metastore content, and apparently the columns are 
store in two tables :
   - COLUMNS_V2 (one row per column)
   - TABLE_PARAMS (a key/value table with a spark json schema in it)
   
   After hive -sync, only the firt hms table get updated with the new column. 
The spark json is not updated with the new column.
   If I purge the table_param table, then magically spark has now the new 
column in the schema.
   
   Then I think the problem is on the spark or hive metastore (not hudi) side, 
which stores it's columns in an alternative table and don't get modified.
   
   But as a result, hudi schema evolution is kind of broken on the spark side. 
People who read the table from metastore won't see the new columns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3185) HoodieConfig getBoolean method returns null instead of default value

2022-01-06 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-3185:
--
Priority: Blocker  (was: Major)

> HoodieConfig getBoolean method returns null instead of default value
> 
>
> Key: HUDI-3185
> URL: https://issues.apache.org/jira/browse/HUDI-3185
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Blocker
> Fix For: 0.10.1
>
>
> If a config has default value then that should be returned instead of null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HUDI-3185) HoodieConfig getBoolean method returns null instead of default value

2022-01-06 Thread Sagar Sumit (Jira)

Sagar Sumit created HUDI-3185:
-

 Summary: HoodieConfig getBoolean method returns null instead of 
default value
 Key: HUDI-3185
 URL: https://issues.apache.org/jira/browse/HUDI-3185
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Sagar Sumit
Assignee: Sagar Sumit
 Fix For: 0.10.1


If a config has default value then that should be returned instead of null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2429) [UMBRELLA] Comprehensive Schema evolution in Hudi

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2429:
-
Fix Version/s: 0.12.0
   (was: 0.11.0)

> [UMBRELLA] Comprehensive Schema evolution in Hudi
> -
>
> Key: HUDI-2429
> URL: https://issues.apache.org/jira/browse/HUDI-2429
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.12.0
>
>
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution]
>  
> Support comprehensive schema evolution in Hudi
>  * rename cols
>  * drop cols
>  * reorder cols
>  * re-add cols



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-2429) [UMBRELLA] Comprehensive Schema evolution in Hudi

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2429:
-
Fix Version/s: 0.11.0
   (was: 0.12.0)

> [UMBRELLA] Comprehensive Schema evolution in Hudi
> -
>
> Key: HUDI-2429
> URL: https://issues.apache.org/jira/browse/HUDI-2429
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Common Core
>Reporter: tao meng
>Assignee: tao meng
>Priority: Major
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.11.0
>
>
> [https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution]
>  
> Support comprehensive schema evolution in Hudi
>  * rename cols
>  * drop cols
>  * reorder cols
>  * re-add cols



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1896) [UMBRELLA] Implement DeltaStreamer Source for cloud object stores

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1896:
-
Fix Version/s: 1.0.0

> [UMBRELLA] Implement DeltaStreamer Source for cloud object stores
> -
>
> Key: HUDI-1896
> URL: https://issues.apache.org/jira/browse/HUDI-1896
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: DeltaStreamer
>Reporter: Raymond Xu
>Assignee: Rajesh Mahindra
>Priority: Critical
>  Labels: hudi-umbrellas, pull-request-available
> Fix For: 1.0.0
>
>
> As discussed in HUDI-1723, we need a better implementation for Cloud object 
> storage like AWS S3 or GCS, leveraging on change notification.
> Also consider 
> [https://docs.databricks.com/spark/latest/structured-streaming/sqs.html]
>  
> We need to look into current *DFSSource classes and see if we can add a new 
> `DFSPathSelector` implementation, that fetech new files on cloud storage 
> after a given point in time. The timestamp based approach used by existing 
> path selector, largely works, but has corner cases as mentioned in HUDI-1723 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] codope commented on a change in pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



codope commented on a change in pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#discussion_r779645642



##
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/keygen/TestTimestampBasedKeyGenerator.java
##
@@ -238,6 +238,40 @@ public void testScalar() throws IOException {
 assertEquals("2021-04-19", keyGen.getPartitionPath(baseRow));
   }
 
+  @Test
+  public void testScalarWithLogicalType() throws IOException {
+schema = SchemaTestUtil.getTimestampWithLogicalTypeSchema();
+structType = AvroConversionUtils.convertAvroSchemaToStructType(schema);
+baseRecord = SchemaTestUtil.generateAvroRecordFromJson(schema, 1, "001", 
"f1");
+baseRecord.put("createTime", 163851380600L);
+
+properties = getBaseKeyConfig("SCALAR", "/MM/dd", "GMT", 
"MICROSECONDS");
+
properties.setProperty(KeyGeneratorOptions.KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED.key(),
 "true");
+TimestampBasedKeyGenerator keyGen = new 
TimestampBasedKeyGenerator(properties);
+HoodieKey hk1 = keyGen.getKey(baseRecord);
+assertEquals("2021/12/03", hk1.getPartitionPath());
+
+// test w/ Row
+baseRow = genericRecordToRow(baseRecord);
+assertEquals("2021/12/03", keyGen.getPartitionPath(baseRow));
+internalRow = KeyGeneratorTestUtilities.getInternalRow(baseRow);
+assertEquals("2021/12/03", keyGen.getPartitionPath(internalRow, 
baseRow.schema()));

Review comment:
   If config is not set then it throws an exception
   `HoodieKeyGeneratorException: Unable to parse input partition field`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-1046) Support updates during clustering in CoW mode

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1046:
-
Priority: Blocker  (was: Major)

> Support updates during clustering in CoW mode
> -
>
> Key: HUDI-1046
> URL: https://issues.apache.org/jira/browse/HUDI-1046
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: shenh062326
>Priority: Blocker
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1046) Support updates during clustering in CoW mode

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1046:
-
Fix Version/s: 0.12.0

> Support updates during clustering in CoW mode
> -
>
> Key: HUDI-1046
> URL: https://issues.apache.org/jira/browse/HUDI-1046
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: shenh062326
>Priority: Major
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1045) Support updates during clustering in MoR mode

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1045:
-
Priority: Blocker  (was: Major)

> Support updates during clustering in MoR mode
> -
>
> Key: HUDI-1045
> URL: https://issues.apache.org/jira/browse/HUDI-1045
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-1045) Support updates during clustering in MoR mode

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1045:
-
Fix Version/s: 0.12.0

> Support updates during clustering in MoR mode
> -
>
> Key: HUDI-1045
> URL: https://issues.apache.org/jira/browse/HUDI-1045
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
> Fix For: 0.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] codope merged pull request #4507: [HUDI-52] Enabling savepoint and restore for MOR table

2022-01-06 Thread GitBox



codope merged pull request #4507:
URL: https://github.com/apache/hudi/pull/4507


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[hudi] branch master updated: [HUDI-52] Enabling savepoint and restore for MOR table (#4507)

2022-01-06 Thread codope

This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2954027  [HUDI-52] Enabling savepoint and restore for MOR table (#4507)
2954027 is described below

commit 2954027b92ada82c41d0a72cc0b837564a730a89
Author: Sivabalan Narayanan 
AuthorDate: Thu Jan 6 10:56:08 2022 -0500

[HUDI-52] Enabling savepoint and restore for MOR table (#4507)

* Enabling restore for MOR table

* Fixing savepoint for compaction commits in MOR
---
 .../hudi/cli/commands/SavepointsCommand.java   | 12 ++-
 .../action/savepoint/SavepointActionExecutor.java  |  9 +--
 .../TestHoodieSparkMergeOnReadTableRollback.java   | 94 ++
 3 files changed, 101 insertions(+), 14 deletions(-)

diff --git 
a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java 
b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java
index 0ea2fff..d3f8584 100644
--- a/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java
+++ b/hudi-cli/src/main/java/org/apache/hudi/cli/commands/SavepointsCommand.java
@@ -78,11 +78,9 @@ public class SavepointsCommand implements CommandMarker {
   throws Exception {
 HoodieTableMetaClient metaClient = HoodieCLI.getTableMetaClient();
 HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
-HoodieTimeline timeline = 
activeTimeline.getCommitTimeline().filterCompletedInstants();
-HoodieInstant commitInstant = new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, commitTime);
 
-if (!timeline.containsInstant(commitInstant)) {
-  return "Commit " + commitTime + " not found in Commits " + timeline;
+if 
(!activeTimeline.getCommitsTimeline().filterCompletedInstants().containsInstant(commitTime))
 {
+  return "Commit " + commitTime + " not found in Commits " + 
activeTimeline;
 }
 
 SparkLauncher sparkLauncher = SparkUtil.initLauncher(sparkPropertiesPath);
@@ -112,10 +110,10 @@ public class SavepointsCommand implements CommandMarker {
   throw new HoodieException("There are no completed instants to run 
rollback");
 }
 HoodieActiveTimeline activeTimeline = metaClient.getActiveTimeline();
-HoodieTimeline timeline = 
activeTimeline.getCommitTimeline().filterCompletedInstants();
-HoodieInstant commitInstant = new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, instantTime);
+HoodieTimeline timeline = 
activeTimeline.getCommitsTimeline().filterCompletedInstants();
+List instants = timeline.getInstants().filter(instant -> 
instant.getTimestamp().equals(instantTime)).collect(Collectors.toList());
 
-if (!timeline.containsInstant(commitInstant)) {
+if (instants.isEmpty()) {
   return "Commit " + instantTime + " not found in Commits " + timeline;
 }
 
diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointActionExecutor.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointActionExecutor.java
index de1d973..134b238 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointActionExecutor.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/savepoint/SavepointActionExecutor.java
@@ -24,7 +24,6 @@ import org.apache.hudi.common.engine.HoodieEngineContext;
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.model.HoodieBaseFile;
 import org.apache.hudi.common.model.HoodieRecordPayload;
-import org.apache.hudi.common.model.HoodieTableType;
 import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.table.timeline.TimelineMetadataUtils;
@@ -65,13 +64,9 @@ public class SavepointActionExecutor ext
 
   @Override
   public HoodieSavepointMetadata execute() {
-if (table.getMetaClient().getTableType() == HoodieTableType.MERGE_ON_READ) 
{
-  throw new UnsupportedOperationException("Savepointing is not supported 
or MergeOnRead table types");
-}
 Option cleanInstant = 
table.getCompletedCleanTimeline().lastInstant();
-HoodieInstant commitInstant = new HoodieInstant(false, 
HoodieTimeline.COMMIT_ACTION, instantTime);
-if (!table.getCompletedCommitsTimeline().containsInstant(commitInstant)) {
-  throw new HoodieSavepointException("Could not savepoint non-existing 
commit " + commitInstant);
+if (!table.getCompletedCommitsTimeline().containsInstant(instantTime)) {
+  throw new HoodieSavepointException("Could not savepoint non-existing 
commit " + instantTime);
 }
 
 try {
diff --git 
a/hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java
 
b/hudi-client/hudi-spar

[jira] [Updated] (HUDI-1456) [UMBRELLA] Concurrency Control for Hudi writers and table services

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-1456:
-
Summary: [UMBRELLA] Concurrency Control for Hudi writers and table services 
 (was: [UMBRELLA] Concurrent Writing (multiwriter) to Hudi tables)

> [UMBRELLA] Concurrency Control for Hudi writers and table services
> --
>
> Key: HUDI-1456
> URL: https://issues.apache.org/jira/browse/HUDI-1456
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Writer Core
>Affects Versions: 0.9.0
>Reporter: Nishith Agarwal
>Assignee: Nishith Agarwal
>Priority: Major
>  Labels: hudi-umbrellas
> Attachments: image-2020-12-14-09-48-46-946.png
>
>
> This ticket tracks all the changes needed to support concurrency control for 
> Hudi tables. This work will be done in multiple phases. 
>  # Parallel writing to Hudi tables support -> This feature will allow users 
> to have multiple writers mutate the tables without the ability to perform 
> concurrent update to the same file. 
>  # Concurrency control at file/record level -> This feature will allow users 
> to have multiple writers mutate the tables with the ability to ensure 
> serializability at record level.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] nsivabalan commented on a change in pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



nsivabalan commented on a change in pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#discussion_r779654146



##
File path: 
hudi-common/src/main/java/org/apache/hudi/keygen/constant/KeyGeneratorOptions.java
##
@@ -56,6 +56,13 @@
   .withDocumentation("Partition path field. Value to be used at the 
partitionPath component of HoodieKey. "
   + "Actual value ontained by invoking .toString()");
 
+  public static final ConfigProperty 
KEYGENERATOR_CONSISTENT_LOGICAL_TIMESTAMP_ENABLED = ConfigProperty
+  
.key("hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled")
+  .defaultValue("false")
+  .withDocumentation("When set to true, consistent value will be generated 
for a logical timestamp type column, "

Review comment:
   Can we add an example here so that users know what to expect if not for 
enabling this config. example covering both row writer and non-writer path.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006705466


   
   ## CI report:
   
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835)
 
   * 2c0565c35723d6f5fee071d14299361b321f202e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1003406987


   
   ## CI report:
   
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006707897


   
   ## CI report:
   
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835)
 
   * 2c0565c35723d6f5fee071d14299361b321f202e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4944)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006705466


   
   ## CI report:
   
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835)
 
   * 2c0565c35723d6f5fee071d14299361b321f202e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (HUDI-3184) hudi-flink support timestamp-micros

2022-01-06 Thread Well Tang (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Well Tang updated HUDI-3184:

Remaining Estimate: 120h  (was: 5h)
 Original Estimate: 120h  (was: 5h)

> hudi-flink support timestamp-micros
> ---
>
> Key: HUDI-3184
> URL: https://issues.apache.org/jira/browse/HUDI-3184
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Well Tang
>Assignee: Well Tang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
> Attachments: 1.png, 2.png, 3.png
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> {*}Problem overview{*}：
> Steps to reproduce the behavior:
> ①The spark engine is used to write data into the hoodie table（PS: There are 
> timestamp type columns in the dataset field）.
> ②Use the Flink engine to read the hoodie table written in step 1.
> *Expected behavior*
> Caused by: java.lang.IllegalArgumentException: Avro does not support 
> TIMESTAMP type with precision: 6, it only supports precision less than 3.
> at 
> org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:221)
>  ~...
> at 
> org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:263)
>  ~...
> at 
> org.apache.hudi.util.AvroSchemaConverter.convertToSchema(AvroSchemaConverter.java:169)
>  ~...
> at 
> org.apache.hudi.table.HoodieTableFactory.inferAvroSchema(HoodieTableFactory.java:239)
>  ~...
> at 
> org.apache.hudi.table.HoodieTableFactory.setupConfOptions(HoodieTableFactory.java:155)
>  ~...
> at 
> org.apache.hudi.table.HoodieTableFactory.createDynamicTableSource(HoodieTableFactory.java:65)
>  ~...
> *Environment Description*
>   Hudi version : 0.11.0-SNAPSHOT
>   Spark version : 3.1.2
>   Flink version : 1.13.1
>   Hive version : None
>   Hadoop version : 2.9.2
>   Storage (HDFS/S3/GCS..) : HDFS
>   Running on Docker? (yes/no) : None
> *Additional context*
> We are using hoodie as a data lake to deliver projects to customers. We found 
> such application scenarios: write data to the hoodie table through the spark 
> engine, and then read data from the hoodie table through the finlk engine.
> It should be noted that the above exception will be caused by how to write to 
> the column containing the timestamp in the dataset.
> In order to simplify the description of the problem, we summarize the problem 
> into the following steps:
> 【step-1】Mock data:
> {code:java}
> /home/deploy/spark-3.1.2-bin-hadoop2.7/bin/spark-shell \
> --driver-class-path /home/workflow/apache-hive-2.3.8-bin/conf/ \
> --master spark://2-120:7077 \
> --executor-memory 4g \
> --driver-memory 4g \
> --num-executors 4 \
> --total-executor-cores 4 \
> --name test \
> --jars 
> /home/deploy/spark-3.1.2-bin-hadoop2.7/jars/hudi-spark3-bundle_2.12-0.11.0-SNAPSHOT.jar,/home/deploy/spark-3.1.2-bin-hadoop2.7/jars/spark-avro_2.12-3.1.2.jar
>  \
> --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
> --conf spark.sql.legacy.parquet.datetimeRebaseModeInRead=CORRECTED \
> --conf spark.sql.hive.convertMetastoreParquet=false {code}
> {code:java}
> val df = spark.sql("select 1 as id, 'A' as name, current_timestamp as dt")
> df.write.format("hudi").
>   option("hoodie.datasource.write.recordkey.field", "id").
>   option("hoodie.datasource.write.precombine.field", "id").
>   option("hoodie.datasource.write.keygenerator.class", 
> "org.apache.hudi.keygen.NonpartitionedKeyGenerator").
>   option("hoodie.upsert.shuffle.parallelism", "2").
>   option("hoodie.table.name", "timestamp_table").
>   mode("append").
>   save("/hudi/suite/data_type_timestamp_table")
> spark.read.format("hudi").load("/hudi/suite/data_type_timestamp_table").show(false)
>  {code}
> 【step-2】Consumption data through flink：
> {code:java}
> bin/sql-client.sh embedded -j lib/hudi-flink-bundle_2.12-0.11.0-SNAPSHOT.jar 
> {code}
> {code:java}
> create table data_type_timestamp_table (
>   `id` INT,
>   `name` STRING,
>   `dt` TIMESTAMP(6)
> ) with (
>   'connector' = 'hudi',
>   'hoodie.table.name' = 'data_type_timestamp_table',
>   'read.streaming.enabled' = 'true',
>   'hoodie.datasource.write.recordkey.field' = 'id',
>   'path' = '/hudi/suite/data_type_timestamp_table',
>   'read.streaming.check-interval' = '10',
>   'table.type' = 'COPY_ON_WRITE',
>   'write.precombine.field' = 'id'
> );
> select * from data_type_timestamp_table; {code}
> As shown below：
> !1.png!
> If we changge timestamp (6) to timestamp (3)，the result is as follows：
> !2.png!
> The data can be found here, but the display is incorrect！
> After checking It is found in the Hoodie directory that the spark write 
> timestamp type is timestamp micros：
> !3.png!
> However, the timestamp type of hook reading and writing Hoodie data is 
> timestamp-millis！

[jira] [Updated] (HUDI-2370) Supports data encryption

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-2370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2370:
-
Fix Version/s: 0.11.0

> Supports data encryption
> 
>
> Key: HUDI-2370
> URL: https://issues.apache.org/jira/browse/HUDI-2370
> Project: Apache Hudi
>  Issue Type: New Feature
>Reporter: liujinhui
>Assignee: liujinhui
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Data security is becoming more and more important, if hudi can support 
> encryption, it is very welcome
> 1. Specify column encryption
>  2. Support footer encryption
>  3. Custom encrypted client interface（Provide memory-based encryption client 
> by default）
> 4. Specify the encryption key
>  
> When querying, you need to pass the relevant key or obtain query permission 
> based on the client's encrypted interface. If it fails, the result cannot be 
> returned.
>  1. When querying non-encrypted fields, the key is not passed, and the data 
> is returned normally
>  2. When querying encrypted fields, the key is not passed and the data is not 
> returned
>  3. When the encrypted field is queried, the key is passed, and the data is 
> returned normally
>  4. When querying all fields, the key is not passed and no result is 
> returned. If passed, the data returns normally
>  
> Start with COW first



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HUDI-3173) Introduce new INDEX action type

2022-01-06 Thread Sagar Sumit (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit updated HUDI-3173:
--
Status: In Progress  (was: Open)

> Introduce new INDEX action type
> ---
>
> Key: HUDI-3173
> URL: https://issues.apache.org/jira/browse/HUDI-3173
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Add a top level INDEX action type and supporting methods in HoodieTimeline.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] nsivabalan commented on a change in pull request #4497: [HUDI-3147] Create pushgateway client based on port

2022-01-06 Thread GitBox



nsivabalan commented on a change in pull request #4497:
URL: https://github.com/apache/hudi/pull/4497#discussion_r779678339



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/prometheus/PushGatewayReporter.java
##
@@ -51,17 +53,30 @@ protected PushGatewayReporter(MetricRegistry registry,
 TimeUnit rateUnit,
 TimeUnit durationUnit,
 String jobName,
-String address,
+String serverHost,
+int serverPort,
 boolean deleteShutdown) {
 super(registry, "hudi-push-gateway-reporter", filter, rateUnit, 
durationUnit);
 this.jobName = jobName;
 this.deleteShutdown = deleteShutdown;
 collectorRegistry = new CollectorRegistry();
 metricExports = new DropwizardExports(registry);
-pushGateway = new PushGateway(address);
+pushGateway = createPushGatewayClient(serverHost, serverPort);
 metricExports.register(collectorRegistry);
   }
 
+  private PushGateway createPushGatewayClient(String serverHost, int 
serverPort) {
+if (serverPort == 443) {
+  try {
+return new PushGateway(new URL("https://"; + serverHost));

Review comment:
   @t0il3ts0ap : Did you test this patch? 
   don't we need 
   ```
   new URL("https://"; + serverHost + ":" + serverPort));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (HUDI-1628) [Umbrella] Improve data locality during ingestion

2022-01-06 Thread Vinoth Chandar (Jira)



[ 
https://issues.apache.org/jira/browse/HUDI-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470041#comment-17470041
 ] 

Vinoth Chandar commented on HUDI-1628:
--

[~guoyihua] assigning to you to drive this forward. 

cc [~thirumalai.raj] please let us know if you are still interested in pursuing 
this.

> [Umbrella] Improve data locality during ingestion
> -
>
> Key: HUDI-1628
> URL: https://issues.apache.org/jira/browse/HUDI-1628
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Writer Core
>Reporter: satish
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hudi-umbrellas
> Fix For: 0.11.0
>
>
> Today the upsert partitioner does the file sizing/bin-packing etc for
> inserts and then sends some inserts over to existing file groups to
> maintain file size.
> We can abstract all of this into strategies and some kind of pipeline
> abstractions and have it also consider "affinity" to an existing file group
> based
> on say information stored in the metadata table?
> See http://mail-archives.apache.org/mod_mbox/hudi-dev/202102.mbox/browser
>  for more details



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Assigned] (HUDI-1628) [Umbrella] Improve data locality during ingestion

2022-01-06 Thread Vinoth Chandar (Jira)



 [ 
https://issues.apache.org/jira/browse/HUDI-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar reassigned HUDI-1628:


Assignee: Ethan Guo  (was: Thirumalai Raj R)

> [Umbrella] Improve data locality during ingestion
> -
>
> Key: HUDI-1628
> URL: https://issues.apache.org/jira/browse/HUDI-1628
> Project: Apache Hudi
>  Issue Type: Epic
>  Components: Writer Core
>Reporter: satish
>Assignee: Ethan Guo
>Priority: Major
>  Labels: hudi-umbrellas
> Fix For: 0.11.0
>
>
> Today the upsert partitioner does the file sizing/bin-packing etc for
> inserts and then sends some inserts over to existing file groups to
> maintain file size.
> We can abstract all of this into strategies and some kind of pipeline
> abstractions and have it also consider "affinity" to an existing file group
> based
> on say information stored in the metadata table?
> See http://mail-archives.apache.org/mod_mbox/hudi-dev/202102.mbox/browser
>  for more details



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006747034


   
   ## CI report:
   
   * 2c0565c35723d6f5fee071d14299361b321f202e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4944)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



hudi-bot removed a comment on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006707897


   
   ## CI report:
   
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835)
 
   * 2c0565c35723d6f5fee071d14299361b321f202e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4944)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2022-01-06 Thread GitBox



hudi-bot commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1006751886


   
   ## CI report:
   
   * 2c0565c35723d6f5fee071d14299361b321f202e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4944)
 
   * eecd338f6aa8c22150cc3a3abc28eb5c2535ef1e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 4 >

1 - 100 of 327 matches

Mail list logo