[jira] [Commented] (HUDI-3261) Query rt table by hive cli throw NoSuchMethodError

2022-01-18 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477640#comment-17477640
 ] 

Danny Chen commented on HUDI-3261:
--

Thanks for the contribution, added.

> Query rt table by hive cli throw NoSuchMethodError
> --
>
> Key: HUDI-3261
> URL: https://issues.apache.org/jira/browse/HUDI-3261
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Echo Lee
>Assignee: Echo Lee
>Priority: Major
>  Labels: pull-request-available, sev:normal
>
> When query the MOR table synchronized from hudi to hive, the following 
> exception is thrown:
>  
>  
> {code:java}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;
>         at 
> org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217)
>         at 
> org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71)
>         at 
> org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72)
>         at 
> org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70)
>         at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87)
>         at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67)
>         at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:317)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>         at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2227)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:227){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3261) Query rt table by hive cli throw NoSuchMethodError

2022-01-18 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen reassigned HUDI-3261:


Assignee: Echo Lee

> Query rt table by hive cli throw NoSuchMethodError
> --
>
> Key: HUDI-3261
> URL: https://issues.apache.org/jira/browse/HUDI-3261
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Echo Lee
>Assignee: Echo Lee
>Priority: Major
>  Labels: pull-request-available, sev:normal
>
> When query the MOR table synchronized from hudi to hive, the following 
> exception is thrown:
>  
>  
> {code:java}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;
>         at 
> org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217)
>         at 
> org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71)
>         at 
> org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72)
>         at 
> org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70)
>         at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87)
>         at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67)
>         at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:317)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>         at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2227)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:227){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] ChangbingChen commented on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.

2022-01-18 Thread GitBox


ChangbingChen commented on issue #4618:
URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015160211


   > @ChangbingChen i know hudi has a bug for this。 if possible could you pls 
do modify for hudi code and package new hudi jar 
HoodieParquetRealtimeInputFormat.isSplitable
   > 
   > @OverRide protected boolean isSplitable(FileSystem fs, Path filename) { if 
(filename instanceof PathWithLogFilePath) { return 
((PathWithLogFilePath)filename).splitable(); } // return super.isSplitable(fs, 
filename); return false; }
   
   @xiarixiaoyao , sorry, it doesn't work either.  i query the xxx_ro table,  
the inputformat should be org.apache.hudi.hadoop.HoodieParquetInputFormat?
   
   By the way, there are four or five parquet files, and for each compaction 
opertation, a new parquet file wile be generated and the oldest parquet file 
will be deleted.
   So in hive query, it wile scan those all parquet files? perhaps the newest 
one contains all records?
   
   ```
   [yarn@x.x.x.x ~]$ hadoop fs -ls /hudi/mysql_table_sink_new/20220118
   Found 9 items
   -rw-r--r--   3 yarn supergroup   22309728 2022-01-18 15:22 
/hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152035.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup   26237250 2022-01-18 15:24 
/hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152235.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup   25088875 2022-01-18 15:26 
/hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152436.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup   22962237 2022-01-18 15:28 
/hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152636.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup 93 2022-01-18 15:15 
/hudi/mysql_table_sink_new/20220118/.hoodie_partition_metadata
   -rw-r--r--   3 yarn supergroup8456473 2022-01-18 15:21 
/hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152035.parquet
   -rw-r--r--   3 yarn supergroup   10952244 2022-01-18 15:23 
/hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152235.parquet
   -rw-r--r--   3 yarn supergroup   13875797 2022-01-18 15:25 
/hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152436.parquet
   -rw-r--r--   3 yarn supergroup   16555809 2022-01-18 15:27 
/hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152636.parquet
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ChangbingChen removed a comment on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.

2022-01-18 Thread GitBox


ChangbingChen removed a comment on issue #4618:
URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015144555


   
   
   
   
   Sorry, it doesn't work either.  i query the xxx_ro table,  the inputformat 
should be org.apache.hudi.hadoop.HoodieParquetInputFormat?
   
   By the way, there are four or five parquet files, and for each compaction 
opertation, a new parquet file wile be generated and the oldest parquet file 
will be deleted.
   So in hive query, it wile scan those all parquet files? perhaps the newest 
one contains all records?
   
   ```
   [yarn@x.x.x.x ~]$ hadoop fs -ls /hudi/mysql_table_sink_new/20220118
   Found 9 items
   -rw-r--r--   3 yarn supergroup   22309728 2022-01-18 15:22 
/hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152035.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup   26237250 2022-01-18 15:24 
/hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152235.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup   25088875 2022-01-18 15:26 
/hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152436.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup   22962237 2022-01-18 15:28 
/hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152636.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup 93 2022-01-18 15:15 
/hudi/mysql_table_sink_new/20220118/.hoodie_partition_metadata
   -rw-r--r--   3 yarn supergroup8456473 2022-01-18 15:21 
/hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152035.parquet
   -rw-r--r--   3 yarn supergroup   10952244 2022-01-18 15:23 
/hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152235.parquet
   -rw-r--r--   3 yarn supergroup   13875797 2022-01-18 15:25 
/hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152436.parquet
   -rw-r--r--   3 yarn supergroup   16555809 2022-01-18 15:27 
/hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152636.parquet
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] ChangbingChen edited a comment on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.

2022-01-18 Thread GitBox


ChangbingChen edited a comment on issue #4618:
URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015040351


   > @ChangbingChen does parquet files exists in your table? if parquet file 
exists, pls set mapreduce.input.fileinputformat.split.maxsize >=(maxSize of 
paruert file) to forbiden hive spliting the parquet file.
   
   Thanks for reply.  It doesn't work. the default value is 256M.
   ```
   hive> set mapreduce.input.fileinputformat.split.maxsize;
   mapreduce.input.fileinputformat.split.maxsize=25600
   ```
   
   and the maxsize of paruert file is less then 128M.
   ```
   [yarn@x.x.x ~]$ hadoop fs -ls /hudi/mysql_table_sink_new/20220118
   Found 10 items
   -rw-r--r--   3 yarn supergroup7157103 2022-01-18 11:17 
/hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111603.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup7209495 2022-01-18 11:19 
/hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111759.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup   10402799 2022-01-18 11:21 
/hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111959.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup7853954 2022-01-18 11:23 
/hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118112159.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup4666049 2022-01-18 11:24 
/hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118112359.log.1_0-1-0
   -rw-r--r--   3 yarn supergroup 93 2022-01-18 11:16 
/hudi/mysql_table_sink_new/20220118/.hoodie_partition_metadata
   -rw-r--r--   3 yarn supergroup1541035 2022-01-18 11:19 
/hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118111759.parquet
   -rw-r--r--   3 yarn supergroup2741308 2022-01-18 11:21 
/hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118111959.parquet
   -rw-r--r--   3 yarn supergroup4318101 2022-01-18 11:23 
/hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118112159.parquet
   -rw-r--r--   3 yarn supergroup5585232 2022-01-18 11:25 
/hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118112359.parquet
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4625:
URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015135059


   
   ## CI report:
   
   * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4625:
URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015163939


   
   ## CI report:
   
   * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015168193


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102)
 
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015132977


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102)
 
   * 0329837213279896a15384781ae2048ecdb0fc13 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Guanpx commented on issue #4510: [SUPPORT] Impala query error

2022-01-18 Thread GitBox


Guanpx commented on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1015177881


   > @Guanpx : I don't have exp w/ impala. But was MOR querying working from 
impala for older versions of hudi and failing with 0.10.0 ?
   
   I think MOR does not work in any older versions, that hudi version is 
0.5.0-incubating in Impala, and this is commit 
https://github.com/apache/impala/commit/ea0e1def6160d596082b01365fcbbb6e24afb21d
 , cc @garyli1019 
   and this is version in impala: 
https://github.com/apache/impala/blob/master/bin/impala-config.sh#L204
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…

2022-01-18 Thread GitBox


danny0405 commented on pull request #4625:
URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015183371


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


zhangyue19921010 commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015183981


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015184441


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102)
 
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015168193


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102)
 
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4625:
URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015185058


   
   ## CI report:
   
   * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5315)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4625:
URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015163939


   
   ## CI report:
   
   * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015184441


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102)
 
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015186808


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] peanut-chenzhong opened a new pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue

2022-01-18 Thread GitBox


peanut-chenzhong opened a new pull request #4626:
URL: https://github.com/apache/hudi/pull/4626


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-1977) Fix Hudi-CLI show table spark-sql

2022-01-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1977:
-
Labels: pull-request-available  (was: )

> Fix Hudi-CLI show table spark-sql 
> --
>
> Key: HUDI-1977
> URL: https://issues.apache.org/jira/browse/HUDI-1977
> Project: Apache Hudi
>  Issue Type: Task
>  Components: cli, spark
>Reporter: Nishith Agarwal
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> https://github.com/apache/hudi/issues/2955



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4626:
URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015189822


   
   ## CI report:
   
   * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] gunjdesai commented on issue #4437: [QUESTION] Example for CREATE TABLE on TRINO using HUDI

2022-01-18 Thread GitBox


gunjdesai commented on issue #4437:
URL: https://github.com/apache/hudi/issues/4437#issuecomment-1015192029


   Hi Team, 
   
   any luck with this. I've tried asking this question on the slack channel for 
trino as well, but haven't got any luck there. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4626:
URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015189822


   
   ## CI report:
   
   * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4626:
URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015192443


   
   ## CI report:
   
   * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5316)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on issue #4585: Target Schema cannot be set in MultiTableDeltaStreamer

2022-01-18 Thread GitBox


xushiyan commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015195895


   @chrischnweiss So it makes sense to make the registry url more configurable. 
I would recommend you propose idea to improve this based on your use case. You 
can elaborate your idea here or file a JIRA directly to elaborate on how 
exactly the configs could be. Anyone from the community may pick it up for 
implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.

2022-01-18 Thread GitBox


xiarixiaoyao commented on issue #4618:
URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015196785


   @ChangbingChen  sorry i forget one things,  before you use hive
to query hoodie table, do you have  set inputformat,  eg: set 
hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat / or 
set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat
   
   if you have wechat?we can communicate directly through wechat 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Comment Edited] (HUDI-3222) On-call team to triage GH issues, PRs, and JIRAs

2022-01-18 Thread Raymond Xu (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477396#comment-17477396
 ] 

Raymond Xu edited comment on HUDI-3222 at 1/18/22, 8:56 AM:


h4. triaged GH issues

# https://github.com/apache/hudi/issues/4622
# https://github.com/apache/hudi/issues/4550
# https://github.com/apache/hudi/issues/4411
# https://github.com/apache/hudi/issues/4597
# https://github.com/apache/hudi/issues/4623
# https://github.com/apache/hudi/issues/4552
# https://github.com/apache/hudi/issues/4585


was (Author: xushiyan):
h4. triaged GH issues

# https://github.com/apache/hudi/issues/4622
# https://github.com/apache/hudi/issues/4550
# https://github.com/apache/hudi/issues/4411
# https://github.com/apache/hudi/issues/4597
# https://github.com/apache/hudi/issues/4623
# https://github.com/apache/hudi/issues/4552

> On-call team to triage GH issues, PRs, and JIRAs
> 
>
> Key: HUDI-3222
> URL: https://issues.apache.org/jira/browse/HUDI-3222
> Project: Apache Hudi
>  Issue Type: Task
>  Components: dev-experience
>Reporter: Raymond Xu
>Priority: Major
>   Original Estimate: 12h
>  Time Spent: 6h
>  Remaining Estimate: 6h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] danny0405 merged pull request #4624: [HUDI-3261] Query rt table by hive cli throw NoSuchMethodError

2022-01-18 Thread GitBox


danny0405 merged pull request #4624:
URL: https://github.com/apache/hudi/pull/4624


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-3261) Query rt table by hive cli throw NoSuchMethodError

2022-01-18 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-3261.
--

> Query rt table by hive cli throw NoSuchMethodError
> --
>
> Key: HUDI-3261
> URL: https://issues.apache.org/jira/browse/HUDI-3261
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Echo Lee
>Assignee: Echo Lee
>Priority: Major
>  Labels: pull-request-available, sev:normal
>
> When query the MOR table synchronized from hudi to hive, the following 
> exception is thrown:
>  
>  
> {code:java}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;
>         at 
> org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217)
>         at 
> org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71)
>         at 
> org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72)
>         at 
> org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70)
>         at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87)
>         at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67)
>         at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:317)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>         at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2227)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:227){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HUDI-3261) Query rt table by hive cli throw NoSuchMethodError

2022-01-18 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477682#comment-17477682
 ] 

Danny Chen commented on HUDI-3261:
--

Fixed via master branch: 3b56320bd8f189786985fd44fcd47e7abd09efb0

> Query rt table by hive cli throw NoSuchMethodError
> --
>
> Key: HUDI-3261
> URL: https://issues.apache.org/jira/browse/HUDI-3261
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Echo Lee
>Assignee: Echo Lee
>Priority: Major
>  Labels: pull-request-available, sev:normal
>
> When query the MOR table synchronized from hudi to hive, the following 
> exception is thrown:
>  
>  
> {code:java}
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema;
>         at 
> org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217)
>         at 
> org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71)
>         at 
> org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72)
>         at 
> org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70)
>         at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87)
>         at 
> org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67)
>         at 
> org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47)
>         at 
> org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:317)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
>         at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
>         at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
>         at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2227)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836)
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772)
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:313)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:227){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] manojpec commented on a change in pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata

2022-01-18 Thread GitBox


manojpec commented on a change in pull request #4523:
URL: https://github.com/apache/hudi/pull/4523#discussion_r786532331



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java
##
@@ -77,7 +77,9 @@ public void setInstants(List instants) {
*
* @deprecated
*/
-  public HoodieDefaultTimeline() {}
+  public HoodieDefaultTimeline() {
+

Review comment:
   same here

##
File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc
##
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+{
+  "namespace": "org.apache.hudi.avro.model",
+  "type": "record",
+  "name": "HoodieIndexPartitionInfo",
+  "fields": [
+{
+  "name": "version",
+  "type": [
+"int",
+"null"
+  ],
+  "default": 1
+},
+{
+  "name": "metadataPartitionPath",
+  "type": [
+"null",
+"string"
+  ],
+  "default": null
+},
+{
+  "name": "dataPartitionPath",

Review comment:
   Where is this data partition path going to be used? Will all index 
partition have metadata + data partition combo?

##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java
##
@@ -95,7 +95,9 @@ public HoodieArchivedTimeline(HoodieTableMetaClient 
metaClient) {
*
* @deprecated
*/
-  public HoodieArchivedTimeline() {}
+  public HoodieArchivedTimeline() {
+

Review comment:
   this new line add can be reverted.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (3d93e85 -> 3b56320)

2022-01-18 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 3d93e85  [MINOR] Minor improvement in JsonkafkaSource (#4620)
 add 3b56320  [HUDI-3261] Read rt table by hive cli throw NoSuchMethodError 
(#4624)

No new revisions were added by this update.

Summary of changes:
 packaging/hudi-hadoop-mr-bundle/pom.xml | 4 
 1 file changed, 4 insertions(+)


[GitHub] [hudi] pratyakshsharma commented on issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer

2022-01-18 Thread GitBox


pratyakshsharma commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015203741


   > unfortunately our Kafka topic naming schema makes it impossible for us to 
use it this way.
   
   @chrischnweiss Are you trying to say you guys are using a subject naming 
strategy other than `TopicNameStrategy` for your schema registry? MTDS was 
originally designed to cater to use cases with `TopicNameStrategy` as the 
subject naming strategy which is the default provided by Confluent. 
   
   As mentioned by Raymond, please feel free to elaborate your use case and 
contribute the fix back. :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on issue #4600: [SUPPORT]When hive queries Hudi data, the query path is wrong

2022-01-18 Thread GitBox


xiarixiaoyao commented on issue #4600:
URL: https://github.com/apache/hudi/issues/4600#issuecomment-1015211429


   @gubinjie  if you donot want to modfiy hive code.   could you pls trigger 
compaction for your table, one compaction done, parquet file will be created, 
and above problem should not be happen.
   
   problem 2:  now flink only write log file for mor table
   problem3:  I discussed this problem with my company's hive experts, we have 
no way to solve this problem in hudi, since that this check happens before hive 
calll hudi code. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xushiyan commented on a change in pull request #3745: [HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2022-01-18 Thread GitBox


xushiyan commented on a change in pull request #3745:
URL: https://github.com/apache/hudi/pull/3745#discussion_r786538718



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -366,50 +368,50 @@ object HoodieSparkSqlWriter {
 }
 
 
-// Handle various save modes
 if (mode == SaveMode.Ignore && tableExists) {
   log.warn(s"hoodie table at $basePath already exists. Ignoring & not 
performing actual writes.")
   false
 } else {
+  // Handle various save modes
   handleSaveModes(sqlContext.sparkSession, mode, basePath, tableConfig, 
tableName, WriteOperationType.BOOTSTRAP, fs)
-}

Review comment:
   this comment is redundant; it just repeats the method name. we should 
just remove it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2514) Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2022-01-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2514:
-
Status: In Progress  (was: Open)

> Add default hiveTableSerdeProperties for Spark SQL when sync Hive
> -
>
> Key: HUDI-2514
> URL: https://issues.apache.org/jira/browse/HUDI-2514
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive-sync, spark-sql
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Critical
>  Labels: hudi-on-call, pull-request-available
> Fix For: 0.11.0
>
>
> If do not add the default hiveTableSerdeProperties,Spark SQL will not work 
> properly
> For example,update:
>  
> {code:java}
> update hudi.test_hudi_table set price=333 where id=111;
> {code}
>  
> It will throw an Exception:
> {code:java}
> 21/10/03 17:41:15 ERROR SparkSQLDriver: Failed in [update 
> hudi.test_hudi_table set price=333 where id=111]
> java.lang.AssertionError: assertion failed: There are no primary key in table 
> `hudi`.`test_hudi_table`, cannot execute update operator
> at scala.Predef$.assert(Predef.scala:170)
> at 
> org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.buildHoodieConfig(UpdateHoodieTableCommand.scala:91)
> at 
> org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:73)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
> at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
> at org.apache.spark.sql.Dataset.(Dataset.scala:194)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:371)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:274)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> java.lang.AssertionError: assertion failed: There are no primary key in table 
> `hudi`.`test_hudi_table`, cannot execute update operator
> at scala.Predef$.assert(Predef.scala:170)
> at 
> org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.buildHoodieConfig(UpdateHoodieTableCommand.scala:91)
> at 
> org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:73)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(co

[jira] [Updated] (HUDI-2514) Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2022-01-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2514:
-
Status: Patch Available  (was: In Progress)

> Add default hiveTableSerdeProperties for Spark SQL when sync Hive
> -
>
> Key: HUDI-2514
> URL: https://issues.apache.org/jira/browse/HUDI-2514
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive-sync, spark-sql
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Critical
>  Labels: hudi-on-call, pull-request-available
> Fix For: 0.11.0
>
>
> If do not add the default hiveTableSerdeProperties,Spark SQL will not work 
> properly
> For example,update:
>  
> {code:java}
> update hudi.test_hudi_table set price=333 where id=111;
> {code}
>  
> It will throw an Exception:
> {code:java}
> 21/10/03 17:41:15 ERROR SparkSQLDriver: Failed in [update 
> hudi.test_hudi_table set price=333 where id=111]
> java.lang.AssertionError: assertion failed: There are no primary key in table 
> `hudi`.`test_hudi_table`, cannot execute update operator
> at scala.Predef$.assert(Predef.scala:170)
> at 
> org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.buildHoodieConfig(UpdateHoodieTableCommand.scala:91)
> at 
> org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:73)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
> at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
> at org.apache.spark.sql.Dataset.(Dataset.scala:194)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
> at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:371)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:274)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> java.lang.AssertionError: assertion failed: There are no primary key in table 
> `hudi`.`test_hudi_table`, cannot execute update operator
> at scala.Predef$.assert(Predef.scala:170)
> at 
> org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.buildHoodieConfig(UpdateHoodieTableCommand.scala:91)
> at 
> org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:73)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffe

[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015236515


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015186808


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4625:
URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015237181


   
   ## CI report:
   
   * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5315)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4625:
URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015185058


   
   ## CI report:
   
   * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5315)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 merged pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…

2022-01-18 Thread GitBox


danny0405 merged pull request #4625:
URL: https://github.com/apache/hudi/pull/4625


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-3263) Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE

2022-01-18 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-3263.
--

> Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid 
> NPE
> ---
>
> Key: HUDI-3263
> URL: https://issues.apache.org/jira/browse/HUDI-3263
> Project: Apache Hudi
>  Issue Type: Task
>  Components: core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.10.1, 0.11.0
>
> Attachments: 1.png, 2.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[hudi] branch master updated (3b56320 -> 45f054f)

2022-01-18 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from 3b56320  [HUDI-3261] Read rt table by hive cli throw NoSuchMethodError 
(#4624)
 add 45f054f  [HUDI-3263] Do not nullify members in 
HoodieTableFileSystemView#resetViewState to avoid NPE (#4625)

No new revisions were added by this update.

Summary of changes:
 .../table/view/HoodieTableFileSystemView.java  | 30 +++---
 1 file changed, 21 insertions(+), 9 deletions(-)


[jira] [Commented] (HUDI-3263) Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE

2022-01-18 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477717#comment-17477717
 ] 

Danny Chen commented on HUDI-3263:
--

Fixed via master branch: 45f054ffdef568e066a53c63c6e6f8d2b1ee67ea

> Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid 
> NPE
> ---
>
> Key: HUDI-3263
> URL: https://issues.apache.org/jira/browse/HUDI-3263
> Project: Apache Hudi
>  Issue Type: Task
>  Components: core
>Reporter: Danny Chen
>Priority: Major
>  Labels: pull-request-available, sev:normal
> Fix For: 0.10.1, 0.11.0
>
> Attachments: 1.png, 2.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] ChangbingChen commented on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.

2022-01-18 Thread GitBox


ChangbingChen commented on issue #4618:
URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015244832


   > @ChangbingChen sorry i forget one things, before you use hive to query 
hoodie table, do you have set inputformat, eg: set 
hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat / or 
set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat
   > 
   > if you have wechat?we can communicate directly through wechat
   
   great! thanks~~
   
   it's ok when set 
hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat. 
   However, when set 
hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat, the 
query throws an exception. It seems that it's a compatibility problem of hive 
version.
   the hive version i used is 1.1.0-cdh5.13.3, there is no the 
HiveInputFormat.pushProjectionsAndFilters function with same params type, and 
while hive version 2.3.1 does have.
   
   ```
   2022-01-18 17:14:04,796 FATAL [main] org.apache.hadoop.mapred.YarnChild: 
Error running child : java.lang.NoSuchMethodError: 
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.pushProjectionsAndFilters(Lorg/apache/hadoop/mapred/JobConf;Ljava/lang/Class;Lorg/apache/hadoop/fs/Path;)V
at 
org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:551)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
   ```
   
   wx: 13488806793.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Guanpx opened a new issue #4510: [SUPPORT] Impala query error

2022-01-18 Thread GitBox


Guanpx opened a new issue #4510:
URL: https://github.com/apache/hudi/issues/4510


   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. hudi sync hive
   2. CREATE EXTERNAL IMPALA TABLE 
(https://hudi.apache.org/docs/querying_data/#impala-34-or-later)  
   3. select from impala table or REFRESH table
   4. impala error and query without data
   
   **Expected behavior**
   can not query impala table
   
   **Environment Description**
   
   * Hudi version : 0.10.0, MOR
   
   * Hive version : 2.1
   
   * Hadoop version : 3.0
   
   * Storage (HDFS/S3/GCS..) : HDFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   * Impala version : 3.4.0
   
   **Stacktrace**
   
   ```
   I0104 18:06:19.961302 1557231 HoodieTableMetaClient.java:93] Loading 
HoodieTableMetaClient from hdfs://pre-cdh01:8020/hudi/rd/app_columns
   I0104 18:06:19.964633 1557231 FSUtils.java:100] Hadoop Configuration: 
fs.defaultFS: [hdfs://pre-cdh01:8020], Config:[Configuration: core-default.xml, 
core-site.xml, hdfs-default.xml, hdfs-site.xml, mapred-default.xml, 
mapred-site.xml, yarn-default.xml, yarn-site.xml], FileSystem: 
[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-533850282_1, ugi=impala 
(auth:SIMPLE)]]]
   I0104 18:06:19.969547 1557231 HoodieTableConfig.java:68] Loading dataset 
properties from 
hdfs://pre-cdh01:8020/hudi/rd/app_columns/.hoodie/hoodie.properties
   I0104 18:06:19.974251 1557231 HoodieTableMetaClient.java:104] Finished 
Loading Table of type MERGE_ON_READ from 
hdfs://pre-cdh01:8020/hudi/rd/app_columns
   I0104 18:06:19.978808 1557231 HoodieActiveTimeline.java:82] Loaded instants 
java.util.stream.ReferencePipeline$Head@5d12f34a
   E0104 18:06:20.005887 1557231 HoodieROTablePathFilter.java:176] Error 
checking path 
:hdfs://pre-cdh01:8020/hudi/rd/app_columns/.1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0,
 under folder: hdfs://pre-cdh01:8020/hudi/rd/app_columns
   Java exception follows:
   java.lang.IllegalStateException: Hudi File Id 
(HoodieFileGroupId{partitionPath='', 
fileId='1adb0953-af23-48d6-9bf2-acb72716060b'}) has more than 1 pending 
compactions. Instants: (20220104170836577,{"baseInstantTime": 
"20220104165637271", "deltaFilePaths": 
[".1adb0953-af23-48d6-9bf2-acb72716060b_20220104165637271.log.1_0-2-0"], 
"dataFilePath": 
"1adb0953-af23-48d6-9bf2-acb72716060b_1-2-0_20220104165637271.parquet", 
"fileId": "1adb0953-af23-48d6-9bf2-acb72716060b", "partitionPath": "", 
"metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 1.0, 
"TOTAL_LOG_FILES_SIZE": 729214.0, "TOTAL_IO_WRITE_MB": 0.0, "TOTAL_IO_MB": 
1.0}}), (20220104165637271,{"baseInstantTime": "20220104164400776", 
"deltaFilePaths": 
[".1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0"], 
"dataFilePath": null, "fileId": "1adb0953-af23-48d6-9bf2-acb72716060b", 
"partitionPath": "", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 
0.0, "TOTAL_LOG_FILES_SIZE": 8143.0, "TOTAL_IO_WRITE_MB": 120.0
 , "TOTAL_IO_MB": 120.0}})
at 
org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at java.util.Iterator.forEachRemaining(Iterator.java:116)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at 
java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
at 
org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionOperations(CompactionUtils.jav

[GitHub] [hudi] scxwhite commented on issue #4311: Duplicate Records in Merge on Read [SUPPORT]

2022-01-18 Thread GitBox


scxwhite commented on issue #4311:
URL: https://github.com/apache/hudi/issues/4311#issuecomment-1015246379


   I reproduced this problem using the following code. In the following code, I 
repeatedly update 1 pieces of data, but if I execute the following code 
more than 5 times, the program will report an error.
   The problem may occur in clustering. When merging small files, the old 
submitted files are 
obtained(SparkSizeBasedClusteringPlanStrategy#getFileSlicesEligibleForClustering).
   @xushiyan @nsivabalan @vinothchandar @yihua 
   
   ```
@Test
   public void write() {
   List data = new ArrayList<>();
   List dtList = new ArrayList<>();
   dtList.add("197001");
   Random random = new Random();
   for (int i = 0; i < 10; i++) {
   String dt = dtList.get(random.nextInt(dtList.size()));
   data.add("{\"dt\":\"" + dt + "\",\"id\":\"" + 
random.nextInt(1) + "\",\"gmt_modified\":" + System.currentTimeMillis() + 
"}");
   }
   
   Dataset dataset = sparkSession.createDataset(data, 
Encoders.STRING());
   Dataset json = sparkSession.read().json(dataset);
   json.printSchema();
   int dataKeepTime = 5;
   json.selectExpr("dt", "id", "gmt_modified", "'' as name").toDF()
   .write()
   .format("org.apache.hudi")
   .option(HoodieTableConfig.TYPE.key(), 
HoodieTableType.MERGE_ON_READ.name())
   .option(DataSourceWriteOptions.OPERATION().key(), 
WriteOperationType.UPSERT.value())
   .option(DataSourceWriteOptions.TABLE_TYPE().key(), 
HoodieTableType.MERGE_ON_READ.name())
   .option(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key(), "id")
   .option(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key(), 
Constants.DT)
   
.option(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.key(), true)
   .option(HoodieWriteConfig.PRECOMBINE_FIELD_NAME.key(), 
Constants.UPDATE_TIME)
   .option(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(), true)
   .option(HoodieWriteConfig.UPSERT_PARALLELISM_VALUE.key(), 
200)
   
.option(HoodieWriteConfig.FINALIZE_WRITE_PARALLELISM_VALUE.key(), 200)
   .option(HoodieWriteConfig.WRITE_PAYLOAD_CLASS_NAME.key(), 
DefaultHoodieRecordPayload.class.getName())
   
.option(HoodieWriteConfig.AVRO_EXTERNAL_SCHEMA_TRANSFORMATION_ENABLE.key(), 
false)
   .option(HoodieWriteConfig.MARKERS_TYPE.key(), 
MarkerType.DIRECT.toString())
   .option(HoodieCompactionConfig.PAYLOAD_CLASS_NAME.key(), 
DefaultHoodieRecordPayload.class.getName())
   
.option(HoodieCompactionConfig.CLEANER_FILE_VERSIONS_RETAINED.key(), 
dataKeepTime)
   .option(HoodieCompactionConfig.AUTO_CLEAN.key(), false)
   
.option(HoodieCompactionConfig.CLEANER_INCREMENTAL_MODE_ENABLE.key(), false)
   
.option(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED.key(), dataKeepTime)
   .option(HoodieCompactionConfig.MIN_COMMITS_TO_KEEP.key(), 
dataKeepTime + 1)
   .option(HoodieCompactionConfig.MAX_COMMITS_TO_KEEP.key(), 
dataKeepTime + 2)
   
.option(HoodieCompactionConfig.TARGET_IO_PER_COMPACTION_IN_MB.key(), 500 * 1024)
   .option(HoodieCompactionConfig.CLEANER_POLICY.key(), 
HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name())
   
.option(HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT.key(), 128 * 1024 * 
1024)
   
.option(HoodieCompactionConfig.FAILED_WRITES_CLEANER_POLICY.key(), 
HoodieFailedWritesCleaningPolicy.EAGER.name())
   .option(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key(), 256 
* 1024 * 1024)
   .option(HoodieCompactionConfig.INLINE_COMPACT.key(), true)
   
.option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS.key(), 1)
   .option(HoodieClusteringConfig.INLINE_CLUSTERING.key(), true)
   
.option(HoodieClusteringConfig.INLINE_CLUSTERING_MAX_COMMITS.key(), 1)
   
.option(HoodieClusteringConfig.PLAN_STRATEGY_MAX_BYTES_PER_OUTPUT_FILEGROUP.key(),
 256 * 1024 * 1024L)
   
.option(HoodieClusteringConfig.PLAN_STRATEGY_TARGET_FILE_MAX_BYTES.key(), 256 * 
1024 * 1024L)
   
.option(HoodieClusteringConfig.PLAN_STRATEGY_SMALL_FILE_LIMIT.key(), 128 * 1024 
* 1024L)
   .option(HoodieClusteringConfig.UPDATES_STRATEGY.key(), 
SparkRejectUpdateStrategy.class.getName())
   .option(HoodieMetadataConfig.ENABLE.key(), true)
   .option(HoodieMetadataConfig.MIN_COMMITS_TO_KEEP.key(), 
dataKeepTime + 1)
   .option(HoodieMetadataConfig.MAX_COMMITS_TO_KEEP.key(), 
dataKeepTime + 2)
   .option(HoodieMe

[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015247003


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295)
 
   * 952a154b1c656cd8e3c9c0df9fee313d3890d938 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1014450356


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] scxwhite commented on issue #4311: Duplicate Records in Merge on Read [SUPPORT]

2022-01-18 Thread GitBox


scxwhite commented on issue #4311:
URL: https://github.com/apache/hudi/issues/4311#issuecomment-1015248403


   In addition, my Hudi version is 0.9.0 and spark version is 3.0.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] melin opened a new issue #4627: [SUPPORT] Dremio integration

2022-01-18 Thread GitBox


melin opened a new issue #4627:
URL: https://github.com/apache/hudi/issues/4627


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4626:
URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015192443


   
   ## CI report:
   
   * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5316)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4626:
URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015272593


   
   ## CI report:
   
   * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5316)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2873) Support optimize data layout by sql and make the build more fast

2022-01-18 Thread Tao Meng (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477743#comment-17477743
 ] 

Tao Meng commented on HUDI-2873:


[~shibei]  do you have wechat,  pls add me 1037817390

> Support optimize data layout by sql and make the build more fast
> 
>
> Key: HUDI-2873
> URL: https://issues.apache.org/jira/browse/HUDI-2873
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Performance, spark
>Reporter: tao meng
>Assignee: shibei
>Priority: Critical
>  Labels: sev:high
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015274778


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295)
 
   * 952a154b1c656cd8e3c9c0df9fee313d3890d938 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5319)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015247003


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295)
 
   * 952a154b1c656cd8e3c9c0df9fee313d3890d938 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] gubinjie commented on issue #4600: [SUPPORT]When hive queries Hudi data, the query path is wrong

2022-01-18 Thread GitBox


gubinjie commented on issue #4600:
URL: https://github.com/apache/hudi/issues/4600#issuecomment-1015286535


   @xiarixiaoyao 
   Thank you for your reply
   When I add a kafka connector, and then execute insert into 'hudi' select * 
from 'kafka', ('hudi' and 'kafka' are tables of connector type respectively)
   This time there is no problem, there is data appearing.
   But I have a question:
   If data is not inserted through the kafka connector, do these parameters 
have no effect: compaction.trigger.strategy, compaction.delta_commits, 
compaction.delta_seconds?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


zhangyue19921010 removed a comment on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015183981


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


zhangyue19921010 commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015294358


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015296719


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5320)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015236515


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015274778


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295)
 
   * 952a154b1c656cd8e3c9c0df9fee313d3890d938 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5319)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4287:
URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015311668


   
   ## CI report:
   
   * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN
   * 952a154b1c656cd8e3c9c0df9fee313d3890d938 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5319)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups

2022-01-18 Thread GitBox


codope commented on a change in pull request #4352:
URL: https://github.com/apache/hudi/pull/4352#discussion_r786556671



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java
##
@@ -133,30 +144,89 @@ public HoodieBloomIndex(HoodieWriteConfig config, 
BaseHoodieBloomIndexHelper blo
   /**
* Load all involved files as  pair List.
*/
-  List> loadInvolvedFiles(
+  List> loadColumnRangesFromFiles(
   List partitions, final HoodieEngineContext context, final 
HoodieTable hoodieTable) {
 // Obtain the latest data files from all the partitions.
 List> partitionPathFileIDList = 
getLatestBaseFilesForAllPartitions(partitions, context, hoodieTable).stream()
 .map(pair -> Pair.of(pair.getKey(), pair.getValue().getFileId()))
 .collect(toList());
 
-if (config.getBloomIndexPruneByRanges()) {
-  // also obtain file ranges, if range pruning is enabled
-  context.setJobStatus(this.getClass().getName(), "Obtain key ranges for 
file slices (range pruning=on)");
-  return context.map(partitionPathFileIDList, pf -> {
-try {
-  HoodieRangeInfoHandle rangeInfoHandle = new 
HoodieRangeInfoHandle(config, hoodieTable, pf);
-  String[] minMaxKeys = rangeInfoHandle.getMinMaxKeys();
-  return Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue(), 
minMaxKeys[0], minMaxKeys[1]));
-} catch (MetadataNotFoundException me) {
-  LOG.warn("Unable to find range metadata in file :" + pf);
-  return Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue()));
+context.setJobStatus(this.getClass().getName(), "Obtain key ranges for 
file slices (range pruning=on)");
+return context.map(partitionPathFileIDList, pf -> {
+  try {
+HoodieRangeInfoHandle rangeInfoHandle = new 
HoodieRangeInfoHandle(config, hoodieTable, pf);
+String[] minMaxKeys = rangeInfoHandle.getMinMaxKeys();
+return Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue(), 
minMaxKeys[0], minMaxKeys[1]));
+  } catch (MetadataNotFoundException me) {
+LOG.warn("Unable to find range metadata in file :" + pf);
+return Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue()));
+  }
+}, Math.max(partitionPathFileIDList.size(), 1));
+  }
+
+  /**
+   * Get the latest base files for the requested partitions.
+   *
+   * @param partitions  - List of partitions to get the base files for
+   * @param context - Engine context
+   * @param hoodieTable - Hoodie Table
+   * @return List of partition and file column range info pairs
+   */
+  List> getLatestBaseFilesForPartitions(
+  List partitions, final HoodieEngineContext context, final 
HoodieTable hoodieTable) {
+List> partitionPathFileIDList = 
getLatestBaseFilesForAllPartitions(partitions, context,
+hoodieTable).stream()
+.map(pair -> Pair.of(pair.getKey(), pair.getValue().getFileId()))
+.collect(toList());
+return partitionPathFileIDList.stream()
+.map(pf -> Pair.of(pf.getKey(), new 
BloomIndexFileInfo(pf.getValue(.collect(toList());
+  }
+
+  /**
+   * Load the column stats index as BloomIndexFileInfo for all the involved 
files in the partition.
+   *
+   * @param partitions  - List of partitions for which column stats need to be 
loaded
+   * @param context - Engine context
+   * @param hoodieTable - Hoodie table
+   * @return List of partition and file column range info pairs
+   */
+  List> loadColumnRangesFromMetaIndex(

Review comment:
   Can this method be private?

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLookupHandle.java
##
@@ -46,52 +50,54 @@
 
   private static final Logger LOG = 
LogManager.getLogger(HoodieKeyLookupHandle.class);
 
-  private final HoodieTableType tableType;
-
   private final BloomFilter bloomFilter;
-
   private final List candidateRecordKeys;
-
+  private final boolean useMetadataTableIndex;
+  private Option fileName = Option.empty();
   private long totalKeysChecked;
 
   public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable hoodieTable,
-   Pair partitionPathFilePair) {
-super(config, null, hoodieTable, partitionPathFilePair);
-this.tableType = hoodieTable.getMetaClient().getTableType();
+   Pair partitionPathFileIDPair) {
+this(config, hoodieTable, partitionPathFileIDPair, Option.empty(), false);
+  }
+
+  public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable hoodieTable,
+   Pair partitionPathFileIDPair, 
Option fileName,
+   boolean useMetadataTableIndex) {
+super(config, null, hoodieTable, partitionPathFileIDPair);

Review comment:
   I know this is not due to your change but can we take up replacing 
`null` by Option.empty() in this PR? If not, then at least we should t

[GitHub] [hudi] dongkelun commented on a change in pull request #3745: [HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive

2022-01-18 Thread GitBox


dongkelun commented on a change in pull request #3745:
URL: https://github.com/apache/hudi/pull/3745#discussion_r786654476



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##
@@ -366,50 +368,50 @@ object HoodieSparkSqlWriter {
 }
 
 
-// Handle various save modes
 if (mode == SaveMode.Ignore && tableExists) {
   log.warn(s"hoodie table at $basePath already exists. Ignoring & not 
performing actual writes.")
   false
 } else {
+  // Handle various save modes
   handleSaveModes(sqlContext.sparkSession, mode, basePath, tableConfig, 
tableName, WriteOperationType.BOOTSTRAP, fs)
-}

Review comment:
   OK, I'll submit it later together with others that need to be modified




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3222) On-call team to triage GH issues, PRs, and JIRAs

2022-01-18 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-3222:
-
Reviewers: Raymond Xu, sivabalan narayanan

> On-call team to triage GH issues, PRs, and JIRAs
> 
>
> Key: HUDI-3222
> URL: https://issues.apache.org/jira/browse/HUDI-3222
> Project: Apache Hudi
>  Issue Type: Task
>  Components: dev-experience
>Reporter: Raymond Xu
>Priority: Major
>   Original Estimate: 12h
>  Time Spent: 6h
>  Remaining Estimate: 6h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan commented on issue #4552: [BUG] Data corrupted in the timestamp field to 1970-01-01 19:45:30.000 after subsequent upsert run

2022-01-18 Thread GitBox


nsivabalan commented on issue #4552:
URL: https://github.com/apache/hudi/issues/4552#issuecomment-1015320225


   Closing this one out since we know the root cause and have a solution. Feel 
free to re-open if you have more questions. would be happy to help. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #4552: [BUG] Data corrupted in the timestamp field to 1970-01-01 19:45:30.000 after subsequent upsert run

2022-01-18 Thread GitBox


nsivabalan closed issue #4552:
URL: https://github.com/apache/hudi/issues/4552


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3264) Make schema registry configs more flexible with MultiTableDeltaStreamer

2022-01-18 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3264:
-

 Summary: Make schema registry configs more flexible with 
MultiTableDeltaStreamer
 Key: HUDI-3264
 URL: https://issues.apache.org/jira/browse/HUDI-3264
 Project: Apache Hudi
  Issue Type: Task
  Components: deltastreamer
Reporter: sivabalan narayanan


Ref issue: [https://github.com/apache/hudi/issues/4585]

Hi guys,

we ran into a problem setting the target schema of our Hudi table using the 
MultiTableDeltaStreamer.

Using a normal DeltaStreamer, we are able to set our source and target schemas 
using the properties:
 * hoodie.deltastreamer.schemaprovider.registry.url
 * hoodie.deltastreamer.schemaprovider.registry.targetUrl

We found that we are not able to set these properties on a table basis using 
the MultiTableDeltaStreamer, since the MTDS builds SchemaRegistry URLs for 
target and source schema using the properties:
 * hoodie.deltastreamer.schemaprovider.registry.baseUrl
 * hoodie.deltastreamer.schemaprovider.registry.sourceUrlSuffix
 * hoodie.deltastreamer.schemaprovider.registry.targetUrlSuffix

Later the MultiTableDeltaStreamer uses the source Kafka Topic name also for 
setting the name of the target schema:

 
[hudi/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java|https://github.com/apache/hudi/blob/9fe28e56b49c7bf68ae2d83bfe89755314aa793b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L167]

Line 167 in 
[9fe28e5|https://github.com/apache/hudi/commit/9fe28e56b49c7bf68ae2d83bfe89755314aa793b]
||typedProperties.setProperty(Constants.TARGET_SCHEMA_REGISTRY_URL_PROP, 
schemaRegistryBaseUrl + typedProperties.getString(Constants.KAFKA_TOPIC_PROP) + 
targetSchemaRegistrySuffix);|

 

We think, that schema names should be more configurable, like the origin 
DeltaStreamer would handle it. Actually the names of the schemas you want to 
use for reading or writing the data are very tight coupled to the name of the 
Kafka topic the data is loaded from.

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan commented on issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer

2022-01-18 Thread GitBox


nsivabalan commented on issue #4585:
URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015323676


   I have filed a [jira](https://issues.apache.org/jira/browse/HUDI-3264) on 
this end. @chrischnweiss : Feel free to update the jira w/ your suggestions. 
Even if you can't find cycles to contribute, one of us from the community can 
try to find time to work towards it. 
   
   closing the github issue. we can continue the conversation in jira. 
   thanks for reporting! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer

2022-01-18 Thread GitBox


nsivabalan closed issue #4585:
URL: https://github.com/apache/hudi/issues/4585


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4510: [SUPPORT] Impala query error

2022-01-18 Thread GitBox


nsivabalan commented on issue #4510:
URL: https://github.com/apache/hudi/issues/4510#issuecomment-1015327025


   WE already have a tracking jira to support MOR table type in Impala. If you 
are interested in working towards it, feel free to grab the jira and we can 
help with reviews if need be. Closing the github issue for now. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #4510: [SUPPORT] Impala query error

2022-01-18 Thread GitBox


nsivabalan closed issue #4510:
URL: https://github.com/apache/hudi/issues/4510


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4457: [SUPPORT] Hudi archive stopped working

2022-01-18 Thread GitBox


nsivabalan commented on issue #4457:
URL: https://github.com/apache/hudi/issues/4457#issuecomment-1015328411


   @zuyanton : Do you have any updates for us. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4456: [SUPPORT] MultiWriter w/ DynamoDB - Unable to acquire lock, lock object null

2022-01-18 Thread GitBox


nsivabalan commented on issue #4456:
URL: https://github.com/apache/hudi/issues/4456#issuecomment-1015328893


   @zhedoubushishi : When you get a chance, can you please follow up. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #4439: [BUG] ROLLBACK meet Cannot use marker based rollback strategy on completed error

2022-01-18 Thread GitBox


nsivabalan closed issue #4439:
URL: https://github.com/apache/hudi/issues/4439


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4439: [BUG] ROLLBACK meet Cannot use marker based rollback strategy on completed error

2022-01-18 Thread GitBox


nsivabalan commented on issue #4439:
URL: https://github.com/apache/hudi/issues/4439#issuecomment-1015329290


   Feel free to re-open if you are looking for more assistance. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4434: [SUPPORT]why are there many files under the Hoodie file?

2022-01-18 Thread GitBox


nsivabalan commented on issue #4434:
URL: https://github.com/apache/hudi/issues/4434#issuecomment-1015329998


   @tieke1121 : let us know if you have more questions /clarifications. If not, 
will close out the github issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2022-01-18 Thread GitBox


nsivabalan commented on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-1015333164


   Have updated instructions to access S3 via hudi-cli 
[here](https://hudi.apache.org/docs/next/cli#using-hudi-cli-in-s3). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2022-01-18 Thread GitBox


nsivabalan commented on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-1015334963


   wrt duplicates, 
   in general, a pair of partition path and record key is unique in hudi. If 
not, you need to use global index or non partitioned dataset if you wish to 
have unique record keys globally. 
   And in addition, preCombine configs has to be set appropriately. 
   
   Feel free to close out the issue it was a mis-configuration on your end. If 
not, we can keep the issue open and discuss further. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] stym06 commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2022-01-18 Thread GitBox


stym06 commented on issue #4318:
URL: https://github.com/apache/hudi/issues/4318#issuecomment-1015337273


   @nsivabalan #3222 worked for me. Thanks for the help. We can close it out as 
the operation mode was INSERT and there were duplicate records coming in the 
Kafka topic as well, leading to this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] stym06 closed issue #4318: [SUPPORT] Duplicate records in COW table within same partition path

2022-01-18 Thread GitBox


stym06 closed issue #4318:
URL: https://github.com/apache/hudi/issues/4318


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4607:
URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013628091


   
   ## CI report:
   
   * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5266)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4607:
URL: https://github.com/apache/hudi/pull/4607#issuecomment-1015338194


   
   ## CI report:
   
   * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5266)
 
   * df9e59120041b4de676733449caa99115d26996d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4607:
URL: https://github.com/apache/hudi/pull/4607#issuecomment-1015338194


   
   ## CI report:
   
   * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5266)
 
   * df9e59120041b4de676733449caa99115d26996d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4607:
URL: https://github.com/apache/hudi/pull/4607#issuecomment-1015340335


   
   ## CI report:
   
   * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5266)
 
   * df9e59120041b4de676733449caa99115d26996d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5321)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4241: [SUPPORT] Disaster Recovery (DR) Setup? Questions.

2022-01-18 Thread GitBox


nsivabalan commented on issue #4241:
URL: https://github.com/apache/hudi/issues/4241#issuecomment-1015341274


   We don't have any documentation as such. You need to directly use 
writeClient or go via hudi-cli. 
   
   But here is how you can do savepoint and restore using hudi-cli
   
   ```
   connect --path /tmp/hudi_trips_cow
   commits show
   set --conf SPARK_HOME=[SPARK_HOME_DIR]
   savepoint create --commit 20220105222853592 --sparkMaster local[2]
   
   
   // restore
   
   refresh
   savepoint rollback --savepoint 20220106085108487 --sparkMaster local[2]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan edited a comment on issue #4241: [SUPPORT] Disaster Recovery (DR) Setup? Questions.

2022-01-18 Thread GitBox


nsivabalan edited a comment on issue #4241:
URL: https://github.com/apache/hudi/issues/4241#issuecomment-1015341274


   We don't have any documentation as such. You need to directly use 
writeClient or go via hudi-cli. Hudi-cli is the recommended way. 
   
   But here is how you can do savepoint and restore using hudi-cli
   
   ```
   connect --path /tmp/hudi_trips_cow
   commits show
   set --conf SPARK_HOME=[SPARK_HOME_DIR]
   savepoint create --commit 20220105222853592 --sparkMaster local[2]
   
   
   // restore
   
   refresh
   savepoint rollback --savepoint 20220106085108487 --sparkMaster local[2]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on issue #4541: [SUPPORT] NullPointerException while writing Bulk ingest table

2022-01-18 Thread GitBox


codope commented on issue #4541:
URL: https://github.com/apache/hudi/issues/4541#issuecomment-1015341819


   @nsivabalan Looks like AVRO_SCHEMA is not getting set in bulk insert mode. I 
couldn't find [similar 
logic](https://github.com/apache/hudi/blob/45f054ffdef568e066a53c63c6e6f8d2b1ee67ea/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L442-L443)
 in 0.7.0 bulk insert path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3265) Implement a custom serializer for the WriteStatus

2022-01-18 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-3265:
-

 Summary: Implement a custom serializer for the WriteStatus
 Key: HUDI-3265
 URL: https://issues.apache.org/jira/browse/HUDI-3265
 Project: Apache Hudi
  Issue Type: Task
  Components: flink
Reporter: sivabalan narayanan


When the structure of WriteStatus changed, and when we restart the Flink job 
with the new version, the job will fail to recover.

*To Reproduce*

Steps to reproduce the behavior:
 # Start a flink job.

 # Changed the WriteStatus and restart

 # The job can't recover.

We need to implement a custom serializer for the WriteStatus.

 

Ref issue: [https://github.com/apache/hudi/issues/4032]

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HUDI-3265) Implement a custom serializer for the WriteStatus

2022-01-18 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-3265:
-

Assignee: Gary Li

> Implement a custom serializer for the WriteStatus
> -
>
> Key: HUDI-3265
> URL: https://issues.apache.org/jira/browse/HUDI-3265
> Project: Apache Hudi
>  Issue Type: Task
>  Components: flink
>Reporter: sivabalan narayanan
>Assignee: Gary Li
>Priority: Major
>  Labels: sev:normal
>
> When the structure of WriteStatus changed, and when we restart the Flink job 
> with the new version, the job will fail to recover.
> *To Reproduce*
> Steps to reproduce the behavior:
>  # Start a flink job.
>  # Changed the WriteStatus and restart
>  # The job can't recover.
> We need to implement a custom serializer for the WriteStatus.
>  
> Ref issue: [https://github.com/apache/hudi/issues/4032]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan closed issue #4032: [SUPPORT] StreamWriteFunction WriteMetadataEvent serialization failed when WriteStatus structure changed

2022-01-18 Thread GitBox


nsivabalan closed issue #4032:
URL: https://github.com/apache/hudi/issues/4032


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4032: [SUPPORT] StreamWriteFunction WriteMetadataEvent serialization failed when WriteStatus structure changed

2022-01-18 Thread GitBox


nsivabalan commented on issue #4032:
URL: https://github.com/apache/hudi/issues/4032#issuecomment-1015342387


   Have filed a tracking 
[jira](https://issues.apache.org/jira/browse/HUDI-3265). will close the github 
issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3870: [SUPPORT] Hudi v0.8.0 Savepoint rollback failure

2022-01-18 Thread GitBox


nsivabalan commented on issue #3870:
URL: https://github.com/apache/hudi/issues/3870#issuecomment-1015342619


   @atharvai : hey do you have any updates for us. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot commented on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015344119


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5320)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.

2022-01-18 Thread GitBox


hudi-bot removed a comment on pull request #4078:
URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015296719


   
   ## CI report:
   
   * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN
   * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN
   * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN
   * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN
   * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN
   * 0329837213279896a15384781ae2048ecdb0fc13 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5320)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on issue #4311: Duplicate Records in Merge on Read [SUPPORT]

2022-01-18 Thread GitBox


liujinhui1994 commented on issue #4311:
URL: https://github.com/apache/hudi/issues/4311#issuecomment-1015349498


   Clustering does not currently support updates, this should be your problem. 
@scxwhite  cc@nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-1615) GH Issue 2515/ Failure to archive commits on row writer/delete paths

2022-01-18 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477801#comment-17477801
 ] 

sivabalan narayanan commented on HUDI-1615:
---

[https://github.com/apache/hudi/pull/2653]

 

> GH Issue 2515/ Failure to archive commits on row writer/delete paths
> 
>
> Key: HUDI-1615
> URL: https://issues.apache.org/jira/browse/HUDI-1615
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark, writer-core
>Reporter: Vinoth Chandar
>Assignee: Vinoth Chandar
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.8.0
>
>
> https://github.com/apache/hudi/issues/2515



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] nsivabalan commented on issue #4604: [SUPPORT] Archive functionality fails

2022-01-18 Thread GitBox


nsivabalan commented on issue #4604:
URL: https://github.com/apache/hudi/issues/4604#issuecomment-1015351403


   we have a related [issue](https://github.com/apache/hudi/pull/2653) reported 
earlier. Might help @XuQianJin-Stars triage it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   3   4   5   6   7   >