[jira] [Commented] (HUDI-3261) Query rt table by hive cli throw NoSuchMethodError
[ https://issues.apache.org/jira/browse/HUDI-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477640#comment-17477640 ] Danny Chen commented on HUDI-3261: -- Thanks for the contribution, added. > Query rt table by hive cli throw NoSuchMethodError > -- > > Key: HUDI-3261 > URL: https://issues.apache.org/jira/browse/HUDI-3261 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: Echo Lee >Assignee: Echo Lee >Priority: Major > Labels: pull-request-available, sev:normal > > When query the MOR table synchronized from hudi to hive, the following > exception is thrown: > > > {code:java} > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema; > at > org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217) > at > org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71) > at > org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72) > at > org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67) > at > org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47) > at > org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:317) > at > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:313) > at org.apache.hadoop.util.RunJar.main(RunJar.java:227){code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3261) Query rt table by hive cli throw NoSuchMethodError
[ https://issues.apache.org/jira/browse/HUDI-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen reassigned HUDI-3261: Assignee: Echo Lee > Query rt table by hive cli throw NoSuchMethodError > -- > > Key: HUDI-3261 > URL: https://issues.apache.org/jira/browse/HUDI-3261 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: Echo Lee >Assignee: Echo Lee >Priority: Major > Labels: pull-request-available, sev:normal > > When query the MOR table synchronized from hudi to hive, the following > exception is thrown: > > > {code:java} > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema; > at > org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217) > at > org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71) > at > org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72) > at > org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67) > at > org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47) > at > org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:317) > at > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:313) > at org.apache.hadoop.util.RunJar.main(RunJar.java:227){code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] ChangbingChen commented on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.
ChangbingChen commented on issue #4618: URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015160211 > @ChangbingChen i know hudi has a bug for this。 if possible could you pls do modify for hudi code and package new hudi jar HoodieParquetRealtimeInputFormat.isSplitable > > @OverRide protected boolean isSplitable(FileSystem fs, Path filename) { if (filename instanceof PathWithLogFilePath) { return ((PathWithLogFilePath)filename).splitable(); } // return super.isSplitable(fs, filename); return false; } @xiarixiaoyao , sorry, it doesn't work either. i query the xxx_ro table, the inputformat should be org.apache.hudi.hadoop.HoodieParquetInputFormat? By the way, there are four or five parquet files, and for each compaction opertation, a new parquet file wile be generated and the oldest parquet file will be deleted. So in hive query, it wile scan those all parquet files? perhaps the newest one contains all records? ``` [yarn@x.x.x.x ~]$ hadoop fs -ls /hudi/mysql_table_sink_new/20220118 Found 9 items -rw-r--r-- 3 yarn supergroup 22309728 2022-01-18 15:22 /hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152035.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 26237250 2022-01-18 15:24 /hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152235.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 25088875 2022-01-18 15:26 /hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152436.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 22962237 2022-01-18 15:28 /hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152636.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 93 2022-01-18 15:15 /hudi/mysql_table_sink_new/20220118/.hoodie_partition_metadata -rw-r--r-- 3 yarn supergroup8456473 2022-01-18 15:21 /hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152035.parquet -rw-r--r-- 3 yarn supergroup 10952244 2022-01-18 15:23 /hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152235.parquet -rw-r--r-- 3 yarn supergroup 13875797 2022-01-18 15:25 /hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152436.parquet -rw-r--r-- 3 yarn supergroup 16555809 2022-01-18 15:27 /hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152636.parquet ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ChangbingChen removed a comment on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.
ChangbingChen removed a comment on issue #4618: URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015144555 Sorry, it doesn't work either. i query the xxx_ro table, the inputformat should be org.apache.hudi.hadoop.HoodieParquetInputFormat? By the way, there are four or five parquet files, and for each compaction opertation, a new parquet file wile be generated and the oldest parquet file will be deleted. So in hive query, it wile scan those all parquet files? perhaps the newest one contains all records? ``` [yarn@x.x.x.x ~]$ hadoop fs -ls /hudi/mysql_table_sink_new/20220118 Found 9 items -rw-r--r-- 3 yarn supergroup 22309728 2022-01-18 15:22 /hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152035.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 26237250 2022-01-18 15:24 /hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152235.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 25088875 2022-01-18 15:26 /hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152436.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 22962237 2022-01-18 15:28 /hudi/mysql_table_sink_new/20220118/.77dc5111-0ed0-400c-9df3-84b254650ab5_20220118152636.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 93 2022-01-18 15:15 /hudi/mysql_table_sink_new/20220118/.hoodie_partition_metadata -rw-r--r-- 3 yarn supergroup8456473 2022-01-18 15:21 /hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152035.parquet -rw-r--r-- 3 yarn supergroup 10952244 2022-01-18 15:23 /hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152235.parquet -rw-r--r-- 3 yarn supergroup 13875797 2022-01-18 15:25 /hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152436.parquet -rw-r--r-- 3 yarn supergroup 16555809 2022-01-18 15:27 /hudi/mysql_table_sink_new/20220118/77dc5111-0ed0-400c-9df3-84b254650ab5_0-1-0_20220118152636.parquet ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] ChangbingChen edited a comment on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.
ChangbingChen edited a comment on issue #4618: URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015040351 > @ChangbingChen does parquet files exists in your table? if parquet file exists, pls set mapreduce.input.fileinputformat.split.maxsize >=(maxSize of paruert file) to forbiden hive spliting the parquet file. Thanks for reply. It doesn't work. the default value is 256M. ``` hive> set mapreduce.input.fileinputformat.split.maxsize; mapreduce.input.fileinputformat.split.maxsize=25600 ``` and the maxsize of paruert file is less then 128M. ``` [yarn@x.x.x ~]$ hadoop fs -ls /hudi/mysql_table_sink_new/20220118 Found 10 items -rw-r--r-- 3 yarn supergroup7157103 2022-01-18 11:17 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111603.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup7209495 2022-01-18 11:19 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111759.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 10402799 2022-01-18 11:21 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118111959.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup7853954 2022-01-18 11:23 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118112159.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup4666049 2022-01-18 11:24 /hudi/mysql_table_sink_new/20220118/.82f164fd-f97d-4691-b9c6-21bea2769be0_20220118112359.log.1_0-1-0 -rw-r--r-- 3 yarn supergroup 93 2022-01-18 11:16 /hudi/mysql_table_sink_new/20220118/.hoodie_partition_metadata -rw-r--r-- 3 yarn supergroup1541035 2022-01-18 11:19 /hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118111759.parquet -rw-r--r-- 3 yarn supergroup2741308 2022-01-18 11:21 /hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118111959.parquet -rw-r--r-- 3 yarn supergroup4318101 2022-01-18 11:23 /hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118112159.parquet -rw-r--r-- 3 yarn supergroup5585232 2022-01-18 11:25 /hudi/mysql_table_sink_new/20220118/82f164fd-f97d-4691-b9c6-21bea2769be0_0-1-0_20220118112359.parquet ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
hudi-bot removed a comment on pull request #4625: URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015135059 ## CI report: * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
hudi-bot commented on pull request #4625: URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015163939 ## CI report: * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015168193 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102) * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015132977 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102) * 0329837213279896a15384781ae2048ecdb0fc13 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Guanpx commented on issue #4510: [SUPPORT] Impala query error
Guanpx commented on issue #4510: URL: https://github.com/apache/hudi/issues/4510#issuecomment-1015177881 > @Guanpx : I don't have exp w/ impala. But was MOR querying working from impala for older versions of hudi and failing with 0.10.0 ? I think MOR does not work in any older versions, that hudi version is 0.5.0-incubating in Impala, and this is commit https://github.com/apache/impala/commit/ea0e1def6160d596082b01365fcbbb6e24afb21d , cc @garyli1019 and this is version in impala: https://github.com/apache/impala/blob/master/bin/impala-config.sh#L204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
danny0405 commented on pull request #4625: URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015183371 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
zhangyue19921010 commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015183981 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015184441 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102) * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015168193 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102) * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
hudi-bot commented on pull request #4625: URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015185058 ## CI report: * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5315) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
hudi-bot removed a comment on pull request #4625: URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015163939 ## CI report: * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015184441 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 7ed3f014fa8ec85033ab6a9475279d651944c93c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5099) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5102) * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015186808 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] peanut-chenzhong opened a new pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
peanut-chenzhong opened a new pull request #4626: URL: https://github.com/apache/hudi/pull/4626 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-1977) Fix Hudi-CLI show table spark-sql
[ https://issues.apache.org/jira/browse/HUDI-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1977: - Labels: pull-request-available (was: ) > Fix Hudi-CLI show table spark-sql > -- > > Key: HUDI-1977 > URL: https://issues.apache.org/jira/browse/HUDI-1977 > Project: Apache Hudi > Issue Type: Task > Components: cli, spark >Reporter: Nishith Agarwal >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > > https://github.com/apache/hudi/issues/2955 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
hudi-bot commented on pull request #4626: URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015189822 ## CI report: * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] gunjdesai commented on issue #4437: [QUESTION] Example for CREATE TABLE on TRINO using HUDI
gunjdesai commented on issue #4437: URL: https://github.com/apache/hudi/issues/4437#issuecomment-1015192029 Hi Team, any luck with this. I've tried asking this question on the slack channel for trino as well, but haven't got any luck there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
hudi-bot removed a comment on pull request #4626: URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015189822 ## CI report: * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
hudi-bot commented on pull request #4626: URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015192443 ## CI report: * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5316) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on issue #4585: Target Schema cannot be set in MultiTableDeltaStreamer
xushiyan commented on issue #4585: URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015195895 @chrischnweiss So it makes sense to make the registry url more configurable. I would recommend you propose idea to improve this based on your use case. You can elaborate your idea here or file a JIRA directly to elaborate on how exactly the configs could be. Anyone from the community may pick it up for implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.
xiarixiaoyao commented on issue #4618: URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015196785 @ChangbingChen sorry i forget one things, before you use hive to query hoodie table, do you have set inputformat, eg: set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat / or set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat if you have wechat?we can communicate directly through wechat -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Comment Edited] (HUDI-3222) On-call team to triage GH issues, PRs, and JIRAs
[ https://issues.apache.org/jira/browse/HUDI-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477396#comment-17477396 ] Raymond Xu edited comment on HUDI-3222 at 1/18/22, 8:56 AM: h4. triaged GH issues # https://github.com/apache/hudi/issues/4622 # https://github.com/apache/hudi/issues/4550 # https://github.com/apache/hudi/issues/4411 # https://github.com/apache/hudi/issues/4597 # https://github.com/apache/hudi/issues/4623 # https://github.com/apache/hudi/issues/4552 # https://github.com/apache/hudi/issues/4585 was (Author: xushiyan): h4. triaged GH issues # https://github.com/apache/hudi/issues/4622 # https://github.com/apache/hudi/issues/4550 # https://github.com/apache/hudi/issues/4411 # https://github.com/apache/hudi/issues/4597 # https://github.com/apache/hudi/issues/4623 # https://github.com/apache/hudi/issues/4552 > On-call team to triage GH issues, PRs, and JIRAs > > > Key: HUDI-3222 > URL: https://issues.apache.org/jira/browse/HUDI-3222 > Project: Apache Hudi > Issue Type: Task > Components: dev-experience >Reporter: Raymond Xu >Priority: Major > Original Estimate: 12h > Time Spent: 6h > Remaining Estimate: 6h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] danny0405 merged pull request #4624: [HUDI-3261] Query rt table by hive cli throw NoSuchMethodError
danny0405 merged pull request #4624: URL: https://github.com/apache/hudi/pull/4624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-3261) Query rt table by hive cli throw NoSuchMethodError
[ https://issues.apache.org/jira/browse/HUDI-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-3261. -- > Query rt table by hive cli throw NoSuchMethodError > -- > > Key: HUDI-3261 > URL: https://issues.apache.org/jira/browse/HUDI-3261 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: Echo Lee >Assignee: Echo Lee >Priority: Major > Labels: pull-request-available, sev:normal > > When query the MOR table synchronized from hudi to hive, the following > exception is thrown: > > > {code:java} > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema; > at > org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217) > at > org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71) > at > org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72) > at > org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67) > at > org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47) > at > org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:317) > at > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:313) > at org.apache.hadoop.util.RunJar.main(RunJar.java:227){code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HUDI-3261) Query rt table by hive cli throw NoSuchMethodError
[ https://issues.apache.org/jira/browse/HUDI-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477682#comment-17477682 ] Danny Chen commented on HUDI-3261: -- Fixed via master branch: 3b56320bd8f189786985fd44fcd47e7abd09efb0 > Query rt table by hive cli throw NoSuchMethodError > -- > > Key: HUDI-3261 > URL: https://issues.apache.org/jira/browse/HUDI-3261 > Project: Apache Hudi > Issue Type: Bug > Components: hive >Reporter: Echo Lee >Assignee: Echo Lee >Priority: Major > Labels: pull-request-available, sev:normal > > When query the MOR table synchronized from hudi to hive, the following > exception is thrown: > > > {code:java} > Exception in thread "main" java.lang.NoSuchMethodError: > org.apache.parquet.avro.AvroSchemaConverter.convert(Lorg/apache/parquet/schema/MessageType;)Lorg/apache/hudi/org/apache/avro/Schema; > at > org.apache.hudi.common.util.ParquetUtils.readAvroSchema(ParquetUtils.java:217) > at > org.apache.hudi.io.storage.HoodieParquetReader.getSchema(HoodieParquetReader.java:71) > at > org.apache.hudi.hadoop.utils.HoodieRealtimeRecordReaderUtils.readSchema(HoodieRealtimeRecordReaderUtils.java:72) > at > org.apache.hudi.hadoop.InputSplitUtils.getBaseFileSchema(InputSplitUtils.java:70) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.init(AbstractRealtimeRecordReader.java:87) > at > org.apache.hudi.hadoop.realtime.AbstractRealtimeRecordReader.(AbstractRealtimeRecordReader.java:67) > at > org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:62) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70) > at > org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47) > at > org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:317) > at > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:695) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:333) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2227) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:187) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:409) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:836) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:772) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:699) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:313) > at org.apache.hadoop.util.RunJar.main(RunJar.java:227){code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] manojpec commented on a change in pull request #4523: [WIP][HUDI-3173] Add INDEX action type and corresponding commit metadata
manojpec commented on a change in pull request #4523: URL: https://github.com/apache/hudi/pull/4523#discussion_r786532331 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java ## @@ -77,7 +77,9 @@ public void setInstants(List instants) { * * @deprecated */ - public HoodieDefaultTimeline() {} + public HoodieDefaultTimeline() { + Review comment: same here ## File path: hudi-common/src/main/avro/HoodieIndexPartitionInfo.avsc ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +{ + "namespace": "org.apache.hudi.avro.model", + "type": "record", + "name": "HoodieIndexPartitionInfo", + "fields": [ +{ + "name": "version", + "type": [ +"int", +"null" + ], + "default": 1 +}, +{ + "name": "metadataPartitionPath", + "type": [ +"null", +"string" + ], + "default": null +}, +{ + "name": "dataPartitionPath", Review comment: Where is this data partition path going to be used? Will all index partition have metadata + data partition combo? ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieArchivedTimeline.java ## @@ -95,7 +95,9 @@ public HoodieArchivedTimeline(HoodieTableMetaClient metaClient) { * * @deprecated */ - public HoodieArchivedTimeline() {} + public HoodieArchivedTimeline() { + Review comment: this new line add can be reverted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (3d93e85 -> 3b56320)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 3d93e85 [MINOR] Minor improvement in JsonkafkaSource (#4620) add 3b56320 [HUDI-3261] Read rt table by hive cli throw NoSuchMethodError (#4624) No new revisions were added by this update. Summary of changes: packaging/hudi-hadoop-mr-bundle/pom.xml | 4 1 file changed, 4 insertions(+)
[GitHub] [hudi] pratyakshsharma commented on issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer
pratyakshsharma commented on issue #4585: URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015203741 > unfortunately our Kafka topic naming schema makes it impossible for us to use it this way. @chrischnweiss Are you trying to say you guys are using a subject naming strategy other than `TopicNameStrategy` for your schema registry? MTDS was originally designed to cater to use cases with `TopicNameStrategy` as the subject naming strategy which is the default provided by Confluent. As mentioned by Raymond, please feel free to elaborate your use case and contribute the fix back. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao commented on issue #4600: [SUPPORT]When hive queries Hudi data, the query path is wrong
xiarixiaoyao commented on issue #4600: URL: https://github.com/apache/hudi/issues/4600#issuecomment-1015211429 @gubinjie if you donot want to modfiy hive code. could you pls trigger compaction for your table, one compaction done, parquet file will be created, and above problem should not be happen. problem 2: now flink only write log file for mor table problem3: I discussed this problem with my company's hive experts, we have no way to solve this problem in hudi, since that this check happens before hive calll hudi code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xushiyan commented on a change in pull request #3745: [HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive
xushiyan commented on a change in pull request #3745: URL: https://github.com/apache/hudi/pull/3745#discussion_r786538718 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -366,50 +368,50 @@ object HoodieSparkSqlWriter { } -// Handle various save modes if (mode == SaveMode.Ignore && tableExists) { log.warn(s"hoodie table at $basePath already exists. Ignoring & not performing actual writes.") false } else { + // Handle various save modes handleSaveModes(sqlContext.sparkSession, mode, basePath, tableConfig, tableName, WriteOperationType.BOOTSTRAP, fs) -} Review comment: this comment is redundant; it just repeats the method name. we should just remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2514) Add default hiveTableSerdeProperties for Spark SQL when sync Hive
[ https://issues.apache.org/jira/browse/HUDI-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2514: - Status: In Progress (was: Open) > Add default hiveTableSerdeProperties for Spark SQL when sync Hive > - > > Key: HUDI-2514 > URL: https://issues.apache.org/jira/browse/HUDI-2514 > Project: Apache Hudi > Issue Type: Improvement > Components: hive-sync, spark-sql >Reporter: 董可伦 >Assignee: 董可伦 >Priority: Critical > Labels: hudi-on-call, pull-request-available > Fix For: 0.11.0 > > > If do not add the default hiveTableSerdeProperties,Spark SQL will not work > properly > For example,update: > > {code:java} > update hudi.test_hudi_table set price=333 where id=111; > {code} > > It will throw an Exception: > {code:java} > 21/10/03 17:41:15 ERROR SparkSQLDriver: Failed in [update > hudi.test_hudi_table set price=333 where id=111] > java.lang.AssertionError: assertion failed: There are no primary key in table > `hudi`.`test_hudi_table`, cannot execute update operator > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.buildHoodieConfig(UpdateHoodieTableCommand.scala:91) > at > org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:73) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369) > at org.apache.spark.sql.Dataset.(Dataset.scala:194) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:371) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:274) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > java.lang.AssertionError: assertion failed: There are no primary key in table > `hudi`.`test_hudi_table`, cannot execute update operator > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.buildHoodieConfig(UpdateHoodieTableCommand.scala:91) > at > org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:73) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(co
[jira] [Updated] (HUDI-2514) Add default hiveTableSerdeProperties for Spark SQL when sync Hive
[ https://issues.apache.org/jira/browse/HUDI-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2514: - Status: Patch Available (was: In Progress) > Add default hiveTableSerdeProperties for Spark SQL when sync Hive > - > > Key: HUDI-2514 > URL: https://issues.apache.org/jira/browse/HUDI-2514 > Project: Apache Hudi > Issue Type: Improvement > Components: hive-sync, spark-sql >Reporter: 董可伦 >Assignee: 董可伦 >Priority: Critical > Labels: hudi-on-call, pull-request-available > Fix For: 0.11.0 > > > If do not add the default hiveTableSerdeProperties,Spark SQL will not work > properly > For example,update: > > {code:java} > update hudi.test_hudi_table set price=333 where id=111; > {code} > > It will throw an Exception: > {code:java} > 21/10/03 17:41:15 ERROR SparkSQLDriver: Failed in [update > hudi.test_hudi_table set price=333 where id=111] > java.lang.AssertionError: assertion failed: There are no primary key in table > `hudi`.`test_hudi_table`, cannot execute update operator > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.buildHoodieConfig(UpdateHoodieTableCommand.scala:91) > at > org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:73) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) > at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369) > at org.apache.spark.sql.Dataset.(Dataset.scala:194) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) > at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:371) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:274) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > java.lang.AssertionError: assertion failed: There are no primary key in table > `hudi`.`test_hudi_table`, cannot execute update operator > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.buildHoodieConfig(UpdateHoodieTableCommand.scala:91) > at > org.apache.spark.sql.hudi.command.UpdateHoodieTableCommand.run(UpdateHoodieTableCommand.scala:73) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffe
[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015236515 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015186808 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
hudi-bot commented on pull request #4625: URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015237181 ## CI report: * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5315) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
hudi-bot removed a comment on pull request #4625: URL: https://github.com/apache/hudi/pull/4625#issuecomment-1015185058 ## CI report: * 31a00a1d995612cc616eab9df6c03b5fff87f098 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5312) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5315) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 merged pull request #4625: [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#reset…
danny0405 merged pull request #4625: URL: https://github.com/apache/hudi/pull/4625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (HUDI-3263) Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE
[ https://issues.apache.org/jira/browse/HUDI-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-3263. -- > Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid > NPE > --- > > Key: HUDI-3263 > URL: https://issues.apache.org/jira/browse/HUDI-3263 > Project: Apache Hudi > Issue Type: Task > Components: core >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available, sev:normal > Fix For: 0.10.1, 0.11.0 > > Attachments: 1.png, 2.png > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[hudi] branch master updated (3b56320 -> 45f054f)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 3b56320 [HUDI-3261] Read rt table by hive cli throw NoSuchMethodError (#4624) add 45f054f [HUDI-3263] Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE (#4625) No new revisions were added by this update. Summary of changes: .../table/view/HoodieTableFileSystemView.java | 30 +++--- 1 file changed, 21 insertions(+), 9 deletions(-)
[jira] [Commented] (HUDI-3263) Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid NPE
[ https://issues.apache.org/jira/browse/HUDI-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477717#comment-17477717 ] Danny Chen commented on HUDI-3263: -- Fixed via master branch: 45f054ffdef568e066a53c63c6e6f8d2b1ee67ea > Do not nullify members in HoodieTableFileSystemView#resetViewState to avoid > NPE > --- > > Key: HUDI-3263 > URL: https://issues.apache.org/jira/browse/HUDI-3263 > Project: Apache Hudi > Issue Type: Task > Components: core >Reporter: Danny Chen >Priority: Major > Labels: pull-request-available, sev:normal > Fix For: 0.10.1, 0.11.0 > > Attachments: 1.png, 2.png > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] ChangbingChen commented on issue #4618: [SUPPORT] When querying a hudi table in hive, there have duplicated records.
ChangbingChen commented on issue #4618: URL: https://github.com/apache/hudi/issues/4618#issuecomment-1015244832 > @ChangbingChen sorry i forget one things, before you use hive to query hoodie table, do you have set inputformat, eg: set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat / or set hive.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat > > if you have wechat?we can communicate directly through wechat great! thanks~~ it's ok when set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat. However, when set hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat, the query throws an exception. It seems that it's a compatibility problem of hive version. the hive version i used is 1.1.0-cdh5.13.3, there is no the HiveInputFormat.pushProjectionsAndFilters function with same params type, and while hive version 2.3.1 does have. ``` 2022-01-18 17:14:04,796 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.pushProjectionsAndFilters(Lorg/apache/hadoop/mapred/JobConf;Ljava/lang/Class;Lorg/apache/hadoop/fs/Path;)V at org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat.getRecordReader(HoodieCombineHiveInputFormat.java:551) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) ``` wx: 13488806793. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Guanpx opened a new issue #4510: [SUPPORT] Impala query error
Guanpx opened a new issue #4510: URL: https://github.com/apache/hudi/issues/4510 **Describe the problem you faced** A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. hudi sync hive 2. CREATE EXTERNAL IMPALA TABLE (https://hudi.apache.org/docs/querying_data/#impala-34-or-later) 3. select from impala table or REFRESH table 4. impala error and query without data **Expected behavior** can not query impala table **Environment Description** * Hudi version : 0.10.0, MOR * Hive version : 2.1 * Hadoop version : 3.0 * Storage (HDFS/S3/GCS..) : HDFS * Running on Docker? (yes/no) : no **Additional context** * Impala version : 3.4.0 **Stacktrace** ``` I0104 18:06:19.961302 1557231 HoodieTableMetaClient.java:93] Loading HoodieTableMetaClient from hdfs://pre-cdh01:8020/hudi/rd/app_columns I0104 18:06:19.964633 1557231 FSUtils.java:100] Hadoop Configuration: fs.defaultFS: [hdfs://pre-cdh01:8020], Config:[Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-533850282_1, ugi=impala (auth:SIMPLE)]]] I0104 18:06:19.969547 1557231 HoodieTableConfig.java:68] Loading dataset properties from hdfs://pre-cdh01:8020/hudi/rd/app_columns/.hoodie/hoodie.properties I0104 18:06:19.974251 1557231 HoodieTableMetaClient.java:104] Finished Loading Table of type MERGE_ON_READ from hdfs://pre-cdh01:8020/hudi/rd/app_columns I0104 18:06:19.978808 1557231 HoodieActiveTimeline.java:82] Loaded instants java.util.stream.ReferencePipeline$Head@5d12f34a E0104 18:06:20.005887 1557231 HoodieROTablePathFilter.java:176] Error checking path :hdfs://pre-cdh01:8020/hudi/rd/app_columns/.1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0, under folder: hdfs://pre-cdh01:8020/hudi/rd/app_columns Java exception follows: java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='', fileId='1adb0953-af23-48d6-9bf2-acb72716060b'}) has more than 1 pending compactions. Instants: (20220104170836577,{"baseInstantTime": "20220104165637271", "deltaFilePaths": [".1adb0953-af23-48d6-9bf2-acb72716060b_20220104165637271.log.1_0-2-0"], "dataFilePath": "1adb0953-af23-48d6-9bf2-acb72716060b_1-2-0_20220104165637271.parquet", "fileId": "1adb0953-af23-48d6-9bf2-acb72716060b", "partitionPath": "", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 1.0, "TOTAL_LOG_FILES_SIZE": 729214.0, "TOTAL_IO_WRITE_MB": 0.0, "TOTAL_IO_MB": 1.0}}), (20220104165637271,{"baseInstantTime": "20220104164400776", "deltaFilePaths": [".1adb0953-af23-48d6-9bf2-acb72716060b_20220104164400776.log.1_0-2-0"], "dataFilePath": null, "fileId": "1adb0953-af23-48d6-9bf2-acb72716060b", "partitionPath": "", "metrics": {"TOTAL_LOG_FILES": 1.0, "TOTAL_IO_READ_MB": 0.0, "TOTAL_LOG_FILES_SIZE": 8143.0, "TOTAL_IO_WRITE_MB": 120.0 , "TOTAL_IO_MB": 120.0}}) at org.apache.hudi.common.util.CompactionUtils.lambda$getAllPendingCompactionOperations$5(CompactionUtils.java:161) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Iterator.forEachRemaining(Iterator.java:116) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) at org.apache.hudi.common.util.CompactionUtils.getAllPendingCompactionOperations(CompactionUtils.jav
[GitHub] [hudi] scxwhite commented on issue #4311: Duplicate Records in Merge on Read [SUPPORT]
scxwhite commented on issue #4311: URL: https://github.com/apache/hudi/issues/4311#issuecomment-1015246379 I reproduced this problem using the following code. In the following code, I repeatedly update 1 pieces of data, but if I execute the following code more than 5 times, the program will report an error. The problem may occur in clustering. When merging small files, the old submitted files are obtained(SparkSizeBasedClusteringPlanStrategy#getFileSlicesEligibleForClustering). @xushiyan @nsivabalan @vinothchandar @yihua ``` @Test public void write() { List data = new ArrayList<>(); List dtList = new ArrayList<>(); dtList.add("197001"); Random random = new Random(); for (int i = 0; i < 10; i++) { String dt = dtList.get(random.nextInt(dtList.size())); data.add("{\"dt\":\"" + dt + "\",\"id\":\"" + random.nextInt(1) + "\",\"gmt_modified\":" + System.currentTimeMillis() + "}"); } Dataset dataset = sparkSession.createDataset(data, Encoders.STRING()); Dataset json = sparkSession.read().json(dataset); json.printSchema(); int dataKeepTime = 5; json.selectExpr("dt", "id", "gmt_modified", "'' as name").toDF() .write() .format("org.apache.hudi") .option(HoodieTableConfig.TYPE.key(), HoodieTableType.MERGE_ON_READ.name()) .option(DataSourceWriteOptions.OPERATION().key(), WriteOperationType.UPSERT.value()) .option(DataSourceWriteOptions.TABLE_TYPE().key(), HoodieTableType.MERGE_ON_READ.name()) .option(KeyGeneratorOptions.RECORDKEY_FIELD_NAME.key(), "id") .option(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key(), Constants.DT) .option(KeyGeneratorOptions.HIVE_STYLE_PARTITIONING_ENABLE.key(), true) .option(HoodieWriteConfig.PRECOMBINE_FIELD_NAME.key(), Constants.UPDATE_TIME) .option(HoodieWriteConfig.COMBINE_BEFORE_UPSERT.key(), true) .option(HoodieWriteConfig.UPSERT_PARALLELISM_VALUE.key(), 200) .option(HoodieWriteConfig.FINALIZE_WRITE_PARALLELISM_VALUE.key(), 200) .option(HoodieWriteConfig.WRITE_PAYLOAD_CLASS_NAME.key(), DefaultHoodieRecordPayload.class.getName()) .option(HoodieWriteConfig.AVRO_EXTERNAL_SCHEMA_TRANSFORMATION_ENABLE.key(), false) .option(HoodieWriteConfig.MARKERS_TYPE.key(), MarkerType.DIRECT.toString()) .option(HoodieCompactionConfig.PAYLOAD_CLASS_NAME.key(), DefaultHoodieRecordPayload.class.getName()) .option(HoodieCompactionConfig.CLEANER_FILE_VERSIONS_RETAINED.key(), dataKeepTime) .option(HoodieCompactionConfig.AUTO_CLEAN.key(), false) .option(HoodieCompactionConfig.CLEANER_INCREMENTAL_MODE_ENABLE.key(), false) .option(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED.key(), dataKeepTime) .option(HoodieCompactionConfig.MIN_COMMITS_TO_KEEP.key(), dataKeepTime + 1) .option(HoodieCompactionConfig.MAX_COMMITS_TO_KEEP.key(), dataKeepTime + 2) .option(HoodieCompactionConfig.TARGET_IO_PER_COMPACTION_IN_MB.key(), 500 * 1024) .option(HoodieCompactionConfig.CLEANER_POLICY.key(), HoodieCleaningPolicy.KEEP_LATEST_FILE_VERSIONS.name()) .option(HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT.key(), 128 * 1024 * 1024) .option(HoodieCompactionConfig.FAILED_WRITES_CLEANER_POLICY.key(), HoodieFailedWritesCleaningPolicy.EAGER.name()) .option(HoodieStorageConfig.PARQUET_MAX_FILE_SIZE.key(), 256 * 1024 * 1024) .option(HoodieCompactionConfig.INLINE_COMPACT.key(), true) .option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS.key(), 1) .option(HoodieClusteringConfig.INLINE_CLUSTERING.key(), true) .option(HoodieClusteringConfig.INLINE_CLUSTERING_MAX_COMMITS.key(), 1) .option(HoodieClusteringConfig.PLAN_STRATEGY_MAX_BYTES_PER_OUTPUT_FILEGROUP.key(), 256 * 1024 * 1024L) .option(HoodieClusteringConfig.PLAN_STRATEGY_TARGET_FILE_MAX_BYTES.key(), 256 * 1024 * 1024L) .option(HoodieClusteringConfig.PLAN_STRATEGY_SMALL_FILE_LIMIT.key(), 128 * 1024 * 1024L) .option(HoodieClusteringConfig.UPDATES_STRATEGY.key(), SparkRejectUpdateStrategy.class.getName()) .option(HoodieMetadataConfig.ENABLE.key(), true) .option(HoodieMetadataConfig.MIN_COMMITS_TO_KEEP.key(), dataKeepTime + 1) .option(HoodieMetadataConfig.MAX_COMMITS_TO_KEEP.key(), dataKeepTime + 2) .option(HoodieMe
[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink
hudi-bot commented on pull request #4287: URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015247003 ## CI report: * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295) * 952a154b1c656cd8e3c9c0df9fee313d3890d938 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink
hudi-bot removed a comment on pull request #4287: URL: https://github.com/apache/hudi/pull/4287#issuecomment-1014450356 ## CI report: * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] scxwhite commented on issue #4311: Duplicate Records in Merge on Read [SUPPORT]
scxwhite commented on issue #4311: URL: https://github.com/apache/hudi/issues/4311#issuecomment-1015248403 In addition, my Hudi version is 0.9.0 and spark version is 3.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] melin opened a new issue #4627: [SUPPORT] Dremio integration
melin opened a new issue #4627: URL: https://github.com/apache/hudi/issues/4627 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** A clear and concise description of the problem. **To Reproduce** Steps to reproduce the behavior: 1. 2. 3. 4. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : * Spark version : * Hive version : * Hadoop version : * Storage (HDFS/S3/GCS..) : * Running on Docker? (yes/no) : **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
hudi-bot removed a comment on pull request #4626: URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015192443 ## CI report: * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5316) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4626: [HUDI-1977] Fix Hudi CLI tempview query LOG issue
hudi-bot commented on pull request #4626: URL: https://github.com/apache/hudi/pull/4626#issuecomment-1015272593 ## CI report: * 3860b11c8eb0823ffb1c8bcf23869b8c17c91df6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5316) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-2873) Support optimize data layout by sql and make the build more fast
[ https://issues.apache.org/jira/browse/HUDI-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477743#comment-17477743 ] Tao Meng commented on HUDI-2873: [~shibei] do you have wechat, pls add me 1037817390 > Support optimize data layout by sql and make the build more fast > > > Key: HUDI-2873 > URL: https://issues.apache.org/jira/browse/HUDI-2873 > Project: Apache Hudi > Issue Type: Task > Components: Performance, spark >Reporter: tao meng >Assignee: shibei >Priority: Critical > Labels: sev:high > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink
hudi-bot commented on pull request #4287: URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015274778 ## CI report: * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295) * 952a154b1c656cd8e3c9c0df9fee313d3890d938 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5319) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink
hudi-bot removed a comment on pull request #4287: URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015247003 ## CI report: * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295) * 952a154b1c656cd8e3c9c0df9fee313d3890d938 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] gubinjie commented on issue #4600: [SUPPORT]When hive queries Hudi data, the query path is wrong
gubinjie commented on issue #4600: URL: https://github.com/apache/hudi/issues/4600#issuecomment-1015286535 @xiarixiaoyao Thank you for your reply When I add a kafka connector, and then execute insert into 'hudi' select * from 'kafka', ('hudi' and 'kafka' are tables of connector type respectively) This time there is no problem, there is data appearing. But I have a question: If data is not inserted through the kafka connector, do these parameters have no effect: compaction.trigger.strategy, compaction.delta_commits, compaction.delta_seconds? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
zhangyue19921010 removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015183981 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] zhangyue19921010 commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
zhangyue19921010 commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015294358 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015296719 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5320) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015236515 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink
hudi-bot removed a comment on pull request #4287: URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015274778 ## CI report: * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN * 7231f68987a5f317f7d71a6485a4c2ea9f917a01 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5295) * 952a154b1c656cd8e3c9c0df9fee313d3890d938 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5319) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4287: [DO NOT MERGE] 0.10.0 release patch for flink
hudi-bot commented on pull request #4287: URL: https://github.com/apache/hudi/pull/4287#issuecomment-1015311668 ## CI report: * 5b7a535559d80359a3febc2d1a80bf9a8ac20cf9 UNKNOWN * 952a154b1c656cd8e3c9c0df9fee313d3890d938 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5319) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on a change in pull request #4352: [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups
codope commented on a change in pull request #4352: URL: https://github.com/apache/hudi/pull/4352#discussion_r786556671 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java ## @@ -133,30 +144,89 @@ public HoodieBloomIndex(HoodieWriteConfig config, BaseHoodieBloomIndexHelper blo /** * Load all involved files as pair List. */ - List> loadInvolvedFiles( + List> loadColumnRangesFromFiles( List partitions, final HoodieEngineContext context, final HoodieTable hoodieTable) { // Obtain the latest data files from all the partitions. List> partitionPathFileIDList = getLatestBaseFilesForAllPartitions(partitions, context, hoodieTable).stream() .map(pair -> Pair.of(pair.getKey(), pair.getValue().getFileId())) .collect(toList()); -if (config.getBloomIndexPruneByRanges()) { - // also obtain file ranges, if range pruning is enabled - context.setJobStatus(this.getClass().getName(), "Obtain key ranges for file slices (range pruning=on)"); - return context.map(partitionPathFileIDList, pf -> { -try { - HoodieRangeInfoHandle rangeInfoHandle = new HoodieRangeInfoHandle(config, hoodieTable, pf); - String[] minMaxKeys = rangeInfoHandle.getMinMaxKeys(); - return Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue(), minMaxKeys[0], minMaxKeys[1])); -} catch (MetadataNotFoundException me) { - LOG.warn("Unable to find range metadata in file :" + pf); - return Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue())); +context.setJobStatus(this.getClass().getName(), "Obtain key ranges for file slices (range pruning=on)"); +return context.map(partitionPathFileIDList, pf -> { + try { +HoodieRangeInfoHandle rangeInfoHandle = new HoodieRangeInfoHandle(config, hoodieTable, pf); +String[] minMaxKeys = rangeInfoHandle.getMinMaxKeys(); +return Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue(), minMaxKeys[0], minMaxKeys[1])); + } catch (MetadataNotFoundException me) { +LOG.warn("Unable to find range metadata in file :" + pf); +return Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue())); + } +}, Math.max(partitionPathFileIDList.size(), 1)); + } + + /** + * Get the latest base files for the requested partitions. + * + * @param partitions - List of partitions to get the base files for + * @param context - Engine context + * @param hoodieTable - Hoodie Table + * @return List of partition and file column range info pairs + */ + List> getLatestBaseFilesForPartitions( + List partitions, final HoodieEngineContext context, final HoodieTable hoodieTable) { +List> partitionPathFileIDList = getLatestBaseFilesForAllPartitions(partitions, context, +hoodieTable).stream() +.map(pair -> Pair.of(pair.getKey(), pair.getValue().getFileId())) +.collect(toList()); +return partitionPathFileIDList.stream() +.map(pf -> Pair.of(pf.getKey(), new BloomIndexFileInfo(pf.getValue(.collect(toList()); + } + + /** + * Load the column stats index as BloomIndexFileInfo for all the involved files in the partition. + * + * @param partitions - List of partitions for which column stats need to be loaded + * @param context - Engine context + * @param hoodieTable - Hoodie table + * @return List of partition and file column range info pairs + */ + List> loadColumnRangesFromMetaIndex( Review comment: Can this method be private? ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLookupHandle.java ## @@ -46,52 +50,54 @@ private static final Logger LOG = LogManager.getLogger(HoodieKeyLookupHandle.class); - private final HoodieTableType tableType; - private final BloomFilter bloomFilter; - private final List candidateRecordKeys; - + private final boolean useMetadataTableIndex; + private Option fileName = Option.empty(); private long totalKeysChecked; public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable hoodieTable, - Pair partitionPathFilePair) { -super(config, null, hoodieTable, partitionPathFilePair); -this.tableType = hoodieTable.getMetaClient().getTableType(); + Pair partitionPathFileIDPair) { +this(config, hoodieTable, partitionPathFileIDPair, Option.empty(), false); + } + + public HoodieKeyLookupHandle(HoodieWriteConfig config, HoodieTable hoodieTable, + Pair partitionPathFileIDPair, Option fileName, + boolean useMetadataTableIndex) { +super(config, null, hoodieTable, partitionPathFileIDPair); Review comment: I know this is not due to your change but can we take up replacing `null` by Option.empty() in this PR? If not, then at least we should t
[GitHub] [hudi] dongkelun commented on a change in pull request #3745: [HUDI-2514] Add default hiveTableSerdeProperties for Spark SQL when sync Hive
dongkelun commented on a change in pull request #3745: URL: https://github.com/apache/hudi/pull/3745#discussion_r786654476 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -366,50 +368,50 @@ object HoodieSparkSqlWriter { } -// Handle various save modes if (mode == SaveMode.Ignore && tableExists) { log.warn(s"hoodie table at $basePath already exists. Ignoring & not performing actual writes.") false } else { + // Handle various save modes handleSaveModes(sqlContext.sparkSession, mode, basePath, tableConfig, tableName, WriteOperationType.BOOTSTRAP, fs) -} Review comment: OK, I'll submit it later together with others that need to be modified -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3222) On-call team to triage GH issues, PRs, and JIRAs
[ https://issues.apache.org/jira/browse/HUDI-3222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3222: - Reviewers: Raymond Xu, sivabalan narayanan > On-call team to triage GH issues, PRs, and JIRAs > > > Key: HUDI-3222 > URL: https://issues.apache.org/jira/browse/HUDI-3222 > Project: Apache Hudi > Issue Type: Task > Components: dev-experience >Reporter: Raymond Xu >Priority: Major > Original Estimate: 12h > Time Spent: 6h > Remaining Estimate: 6h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] nsivabalan commented on issue #4552: [BUG] Data corrupted in the timestamp field to 1970-01-01 19:45:30.000 after subsequent upsert run
nsivabalan commented on issue #4552: URL: https://github.com/apache/hudi/issues/4552#issuecomment-1015320225 Closing this one out since we know the root cause and have a solution. Feel free to re-open if you have more questions. would be happy to help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #4552: [BUG] Data corrupted in the timestamp field to 1970-01-01 19:45:30.000 after subsequent upsert run
nsivabalan closed issue #4552: URL: https://github.com/apache/hudi/issues/4552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3264) Make schema registry configs more flexible with MultiTableDeltaStreamer
sivabalan narayanan created HUDI-3264: - Summary: Make schema registry configs more flexible with MultiTableDeltaStreamer Key: HUDI-3264 URL: https://issues.apache.org/jira/browse/HUDI-3264 Project: Apache Hudi Issue Type: Task Components: deltastreamer Reporter: sivabalan narayanan Ref issue: [https://github.com/apache/hudi/issues/4585] Hi guys, we ran into a problem setting the target schema of our Hudi table using the MultiTableDeltaStreamer. Using a normal DeltaStreamer, we are able to set our source and target schemas using the properties: * hoodie.deltastreamer.schemaprovider.registry.url * hoodie.deltastreamer.schemaprovider.registry.targetUrl We found that we are not able to set these properties on a table basis using the MultiTableDeltaStreamer, since the MTDS builds SchemaRegistry URLs for target and source schema using the properties: * hoodie.deltastreamer.schemaprovider.registry.baseUrl * hoodie.deltastreamer.schemaprovider.registry.sourceUrlSuffix * hoodie.deltastreamer.schemaprovider.registry.targetUrlSuffix Later the MultiTableDeltaStreamer uses the source Kafka Topic name also for setting the name of the target schema: [hudi/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java|https://github.com/apache/hudi/blob/9fe28e56b49c7bf68ae2d83bfe89755314aa793b/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieMultiTableDeltaStreamer.java#L167] Line 167 in [9fe28e5|https://github.com/apache/hudi/commit/9fe28e56b49c7bf68ae2d83bfe89755314aa793b] ||typedProperties.setProperty(Constants.TARGET_SCHEMA_REGISTRY_URL_PROP, schemaRegistryBaseUrl + typedProperties.getString(Constants.KAFKA_TOPIC_PROP) + targetSchemaRegistrySuffix);| We think, that schema names should be more configurable, like the origin DeltaStreamer would handle it. Actually the names of the schemas you want to use for reading or writing the data are very tight coupled to the name of the Kafka topic the data is loaded from. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] nsivabalan commented on issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer
nsivabalan commented on issue #4585: URL: https://github.com/apache/hudi/issues/4585#issuecomment-1015323676 I have filed a [jira](https://issues.apache.org/jira/browse/HUDI-3264) on this end. @chrischnweiss : Feel free to update the jira w/ your suggestions. Even if you can't find cycles to contribute, one of us from the community can try to find time to work towards it. closing the github issue. we can continue the conversation in jira. thanks for reporting! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #4585: [Feature] Make schema registry configs more flexible with MultiTableDeltaStreamer
nsivabalan closed issue #4585: URL: https://github.com/apache/hudi/issues/4585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4510: [SUPPORT] Impala query error
nsivabalan commented on issue #4510: URL: https://github.com/apache/hudi/issues/4510#issuecomment-1015327025 WE already have a tracking jira to support MOR table type in Impala. If you are interested in working towards it, feel free to grab the jira and we can help with reviews if need be. Closing the github issue for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #4510: [SUPPORT] Impala query error
nsivabalan closed issue #4510: URL: https://github.com/apache/hudi/issues/4510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4457: [SUPPORT] Hudi archive stopped working
nsivabalan commented on issue #4457: URL: https://github.com/apache/hudi/issues/4457#issuecomment-1015328411 @zuyanton : Do you have any updates for us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4456: [SUPPORT] MultiWriter w/ DynamoDB - Unable to acquire lock, lock object null
nsivabalan commented on issue #4456: URL: https://github.com/apache/hudi/issues/4456#issuecomment-1015328893 @zhedoubushishi : When you get a chance, can you please follow up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #4439: [BUG] ROLLBACK meet Cannot use marker based rollback strategy on completed error
nsivabalan closed issue #4439: URL: https://github.com/apache/hudi/issues/4439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4439: [BUG] ROLLBACK meet Cannot use marker based rollback strategy on completed error
nsivabalan commented on issue #4439: URL: https://github.com/apache/hudi/issues/4439#issuecomment-1015329290 Feel free to re-open if you are looking for more assistance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4434: [SUPPORT]why are there many files under the Hoodie file?
nsivabalan commented on issue #4434: URL: https://github.com/apache/hudi/issues/4434#issuecomment-1015329998 @tieke1121 : let us know if you have more questions /clarifications. If not, will close out the github issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path
nsivabalan commented on issue #4318: URL: https://github.com/apache/hudi/issues/4318#issuecomment-1015333164 Have updated instructions to access S3 via hudi-cli [here](https://hudi.apache.org/docs/next/cli#using-hudi-cli-in-s3). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path
nsivabalan commented on issue #4318: URL: https://github.com/apache/hudi/issues/4318#issuecomment-1015334963 wrt duplicates, in general, a pair of partition path and record key is unique in hudi. If not, you need to use global index or non partitioned dataset if you wish to have unique record keys globally. And in addition, preCombine configs has to be set appropriately. Feel free to close out the issue it was a mis-configuration on your end. If not, we can keep the issue open and discuss further. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stym06 commented on issue #4318: [SUPPORT] Duplicate records in COW table within same partition path
stym06 commented on issue #4318: URL: https://github.com/apache/hudi/issues/4318#issuecomment-1015337273 @nsivabalan #3222 worked for me. Thanks for the help. We can close it out as the operation mode was INSERT and there were duplicate records coming in the Kafka topic as well, leading to this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stym06 closed issue #4318: [SUPPORT] Duplicate records in COW table within same partition path
stym06 closed issue #4318: URL: https://github.com/apache/hudi/issues/4318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL
hudi-bot removed a comment on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1013628091 ## CI report: * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5266) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL
hudi-bot commented on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1015338194 ## CI report: * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5266) * df9e59120041b4de676733449caa99115d26996d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL
hudi-bot removed a comment on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1015338194 ## CI report: * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5266) * df9e59120041b4de676733449caa99115d26996d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4607: [HUDI-3161][RFC-46] Add Call Produce Command for Spark SQL
hudi-bot commented on pull request #4607: URL: https://github.com/apache/hudi/pull/4607#issuecomment-1015340335 ## CI report: * 9ddbc330d21f82188865a3a76af2b79a98101d3b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5266) * df9e59120041b4de676733449caa99115d26996d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5321) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4241: [SUPPORT] Disaster Recovery (DR) Setup? Questions.
nsivabalan commented on issue #4241: URL: https://github.com/apache/hudi/issues/4241#issuecomment-1015341274 We don't have any documentation as such. You need to directly use writeClient or go via hudi-cli. But here is how you can do savepoint and restore using hudi-cli ``` connect --path /tmp/hudi_trips_cow commits show set --conf SPARK_HOME=[SPARK_HOME_DIR] savepoint create --commit 20220105222853592 --sparkMaster local[2] // restore refresh savepoint rollback --savepoint 20220106085108487 --sparkMaster local[2] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #4241: [SUPPORT] Disaster Recovery (DR) Setup? Questions.
nsivabalan edited a comment on issue #4241: URL: https://github.com/apache/hudi/issues/4241#issuecomment-1015341274 We don't have any documentation as such. You need to directly use writeClient or go via hudi-cli. Hudi-cli is the recommended way. But here is how you can do savepoint and restore using hudi-cli ``` connect --path /tmp/hudi_trips_cow commits show set --conf SPARK_HOME=[SPARK_HOME_DIR] savepoint create --commit 20220105222853592 --sparkMaster local[2] // restore refresh savepoint rollback --savepoint 20220106085108487 --sparkMaster local[2] ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] codope commented on issue #4541: [SUPPORT] NullPointerException while writing Bulk ingest table
codope commented on issue #4541: URL: https://github.com/apache/hudi/issues/4541#issuecomment-1015341819 @nsivabalan Looks like AVRO_SCHEMA is not getting set in bulk insert mode. I couldn't find [similar logic](https://github.com/apache/hudi/blob/45f054ffdef568e066a53c63c6e6f8d2b1ee67ea/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L442-L443) in 0.7.0 bulk insert path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-3265) Implement a custom serializer for the WriteStatus
sivabalan narayanan created HUDI-3265: - Summary: Implement a custom serializer for the WriteStatus Key: HUDI-3265 URL: https://issues.apache.org/jira/browse/HUDI-3265 Project: Apache Hudi Issue Type: Task Components: flink Reporter: sivabalan narayanan When the structure of WriteStatus changed, and when we restart the Flink job with the new version, the job will fail to recover. *To Reproduce* Steps to reproduce the behavior: # Start a flink job. # Changed the WriteStatus and restart # The job can't recover. We need to implement a custom serializer for the WriteStatus. Ref issue: [https://github.com/apache/hudi/issues/4032] -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HUDI-3265) Implement a custom serializer for the WriteStatus
[ https://issues.apache.org/jira/browse/HUDI-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-3265: - Assignee: Gary Li > Implement a custom serializer for the WriteStatus > - > > Key: HUDI-3265 > URL: https://issues.apache.org/jira/browse/HUDI-3265 > Project: Apache Hudi > Issue Type: Task > Components: flink >Reporter: sivabalan narayanan >Assignee: Gary Li >Priority: Major > Labels: sev:normal > > When the structure of WriteStatus changed, and when we restart the Flink job > with the new version, the job will fail to recover. > *To Reproduce* > Steps to reproduce the behavior: > # Start a flink job. > # Changed the WriteStatus and restart > # The job can't recover. > We need to implement a custom serializer for the WriteStatus. > > Ref issue: [https://github.com/apache/hudi/issues/4032] > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] nsivabalan closed issue #4032: [SUPPORT] StreamWriteFunction WriteMetadataEvent serialization failed when WriteStatus structure changed
nsivabalan closed issue #4032: URL: https://github.com/apache/hudi/issues/4032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4032: [SUPPORT] StreamWriteFunction WriteMetadataEvent serialization failed when WriteStatus structure changed
nsivabalan commented on issue #4032: URL: https://github.com/apache/hudi/issues/4032#issuecomment-1015342387 Have filed a tracking [jira](https://issues.apache.org/jira/browse/HUDI-3265). will close the github issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3870: [SUPPORT] Hudi v0.8.0 Savepoint rollback failure
nsivabalan commented on issue #3870: URL: https://github.com/apache/hudi/issues/3870#issuecomment-1015342619 @atharvai : hey do you have any updates for us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot commented on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015344119 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5320) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4078: [HUDI-2833][Design] Merge small archive files instead of expanding indefinitely.
hudi-bot removed a comment on pull request #4078: URL: https://github.com/apache/hudi/pull/4078#issuecomment-1015296719 ## CI report: * 8f8ae385baf21dacd4b9fedd3670133160001dc0 UNKNOWN * 019e161bb908731244e13cdf36d12781956f0114 UNKNOWN * c36aac530d3350857fb01df858d0f26c123e5766 UNKNOWN * 74721ec8c6c318d34e8bde9344b982e9e2390d76 UNKNOWN * ffb67da807ddafbaf18feff01643d02aa5631568 UNKNOWN * 0329837213279896a15384781ae2048ecdb0fc13 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5314) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5313) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5320) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on issue #4311: Duplicate Records in Merge on Read [SUPPORT]
liujinhui1994 commented on issue #4311: URL: https://github.com/apache/hudi/issues/4311#issuecomment-1015349498 Clustering does not currently support updates, this should be your problem. @scxwhite cc@nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HUDI-1615) GH Issue 2515/ Failure to archive commits on row writer/delete paths
[ https://issues.apache.org/jira/browse/HUDI-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17477801#comment-17477801 ] sivabalan narayanan commented on HUDI-1615: --- [https://github.com/apache/hudi/pull/2653] > GH Issue 2515/ Failure to archive commits on row writer/delete paths > > > Key: HUDI-1615 > URL: https://issues.apache.org/jira/browse/HUDI-1615 > Project: Apache Hudi > Issue Type: Bug > Components: spark, writer-core >Reporter: Vinoth Chandar >Assignee: Vinoth Chandar >Priority: Major > Labels: pull-request-available, sev:critical > Fix For: 0.8.0 > > > https://github.com/apache/hudi/issues/2515 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] nsivabalan commented on issue #4604: [SUPPORT] Archive functionality fails
nsivabalan commented on issue #4604: URL: https://github.com/apache/hudi/issues/4604#issuecomment-1015351403 we have a related [issue](https://github.com/apache/hudi/pull/2653) reported earlier. Might help @XuQianJin-Stars triage it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org