[GitHub] [hudi] hudi-bot commented on pull request #5357: [HUDI-3912] Fix lose data when rollback in flink async compact

2022-04-19 Thread GitBox
hudi-bot commented on PR #5357: URL: https://github.com/apache/hudi/pull/5357#issuecomment-1102171990 ## CI report: * d9c0a047103cb4d72ba5d46ee45d5b2c10319458 UNKNOWN * 7de9f669139bdd6b812ff443daab08b940f5319c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[jira] [Created] (HUDI-3916) New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread Yue Zhang (Jira)
Yue Zhang created HUDI-3916: --- Summary: New Key Generator Option: NanoidKeyGenerator Key: HUDI-3916 URL: https://issues.apache.org/jira/browse/HUDI-3916 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] hudi-bot commented on pull request #5328: [WIP][HUDI-3883] Fix Bulk Insert to repartition the dataset based on Partition Path

2022-04-19 Thread GitBox
hudi-bot commented on PR #5328: URL: https://github.com/apache/hudi/pull/5328#issuecomment-1102194382 ## CI report: * 0f0fae82a029d42fa9db7ea8d2df4ba1787fded6 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8129

[GitHub] [hudi] zhangyue19921010 opened a new pull request, #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
zhangyue19921010 opened a new pull request, #5359: URL: https://github.com/apache/hudi/pull/5359 https://issues.apache.org/jira/browse/HUDI-3916 This NanoidKeyGenerator is useful when users ingest aggregation data into hudi without unique key. Compared with UuidKeyGenerator, na

[GitHub] [hudi] hudi-bot commented on pull request #4958: [HUDI-3558] [Stacked 3123/3085] Consistent bucket index: bucket resizing (split&merge) & concurrent write during resizing

2022-04-19 Thread GitBox
hudi-bot commented on PR #4958: URL: https://github.com/apache/hudi/pull/4958#issuecomment-1102198585 ## CI report: * eb1c6e290d676158af9385ef7922e372163113e7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8127

[GitHub] [hudi] hudi-bot commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
hudi-bot commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102199521 ## CI report: * 287a959ff3e828c539422a7450da09e023028486 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #4958: [HUDI-3558] [Stacked 3123/3085] Consistent bucket index: bucket resizing (split&merge) & concurrent write during resizing

2022-04-19 Thread GitBox
hudi-bot commented on PR #4958: URL: https://github.com/apache/hudi/pull/4958#issuecomment-1102203634 ## CI report: * eb1c6e290d676158af9385ef7922e372163113e7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8127

[GitHub] [hudi] hudi-bot commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
hudi-bot commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102204495 ## CI report: * 287a959ff3e828c539422a7450da09e023028486 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8136

[GitHub] [hudi] zhangyue19921010 commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
zhangyue19921010 commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102258674 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [hudi] hudi-bot commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
hudi-bot commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102260252 ## CI report: * 287a959ff3e828c539422a7450da09e023028486 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813

[GitHub] [hudi] hudi-bot commented on pull request #5357: [HUDI-3912] Fix lose data when rollback in flink async compact

2022-04-19 Thread GitBox
hudi-bot commented on PR #5357: URL: https://github.com/apache/hudi/pull/5357#issuecomment-1102250616 ## CI report: * d9c0a047103cb4d72ba5d46ee45d5b2c10319458 UNKNOWN * 611fd1129e76bcfe2332356d4148e31dc49d85c7 UNKNOWN * b9bfb0af2a66a00a1fcc50f58996c85077e37615 UNKNOWN * d7

[GitHub] [hudi] hudi-bot commented on pull request #4958: [HUDI-3558] [Stacked 3123/3085] Consistent bucket index: bucket resizing (split&merge) & concurrent write during resizing

2022-04-19 Thread GitBox
hudi-bot commented on PR #4958: URL: https://github.com/apache/hudi/pull/4958#issuecomment-1102254560 ## CI report: * de60394e7eb2553adf6e5224bc3ffc739b9b1001 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813

[GitHub] [hudi] hudi-bot commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
hudi-bot commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102250676 ## CI report: * 287a959ff3e828c539422a7450da09e023028486 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813

[GitHub] [hudi] hudi-bot commented on pull request #4309: [HUDI-3016][RFC-43] Proposal to implement Compaction/Clustering Servi…

2022-04-19 Thread GitBox
hudi-bot commented on PR #4309: URL: https://github.com/apache/hudi/pull/4309#issuecomment-1102272766 ## CI report: * fbe27691b5d9de58128cc58158047a4df2b53750 UNKNOWN * 9f70edff7df1d467b06c00147c5cc128f4ce4c9d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] sharathkola commented on issue #5223: [SUPPORT] - HUDI clustering - read issues

2022-04-19 Thread GitBox
sharathkola commented on issue #5223: URL: https://github.com/apache/hudi/issues/5223#issuecomment-1102288392 @suryaprasanna Yes, duplicate rows are returned. I have attached both the files you requested. [commit_files.zip](https://github.com/apache/hudi/files/8510790/commit_files.

[GitHub] [hudi] zhilinli123 commented on issue #4881: Full incremental Enable index loading to discover duplicate data(index.bootstrap.enabled)

2022-04-19 Thread GitBox
zhilinli123 commented on issue #4881: URL: https://github.com/apache/hudi/issues/4881#issuecomment-1102329893 In the previous tests, data duplication occurred after the checkpoint was completed for the first time. No new data was written before the checkpoint was completed -- This is

[jira] [Created] (HUDI-3917) Flink write task hangs if last checkpoint has no data input

2022-04-19 Thread Danny Chen (Jira)
Danny Chen created HUDI-3917: Summary: Flink write task hangs if last checkpoint has no data input Key: HUDI-3917 URL: https://issues.apache.org/jira/browse/HUDI-3917 Project: Apache Hudi Issue

[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

2022-04-19 Thread GitBox
hudi-bot commented on PR #5352: URL: https://github.com/apache/hudi/pull/5352#issuecomment-1102379779 ## CI report: * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN * b975d3295cd534ac04c7ee58bb7961fd9971597d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
hudi-bot commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102393502 ## CI report: * 287a959ff3e828c539422a7450da09e023028486 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813

[GitHub] [hudi] zhangyue19921010 commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
zhangyue19921010 commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102395288 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [hudi] hudi-bot commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
hudi-bot commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102398100 ## CI report: * 287a959ff3e828c539422a7450da09e023028486 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813

[GitHub] [hudi] harsh1231 commented on issue #5189: [SUPPORT] Multiple chaining of hudi tables via incremental source results in duplicate partition meta column

2022-04-19 Thread GitBox
harsh1231 commented on issue #5189: URL: https://github.com/apache/hudi/issues/5189#issuecomment-1102418701 @nsivabalan @lowmmrfeeder looks like load from previous table itself is failing `at com.navi.sources.HoodieIncrSource.fetchNextBatch(HoodieIncrSource.java:122)` this looks lik

[GitHub] [hudi] xushiyan commented on pull request #5087: [HUDI-3614] [DO_NOT_MERGE]Replace List with HoodieData in HoodieFlink/JavaTable and commit executors

2022-04-19 Thread GitBox
xushiyan commented on PR #5087: URL: https://github.com/apache/hudi/pull/5087#issuecomment-1102436171 @danny0405 as @liujinhui1994 mentioned, reducing duplicate code is the intention and the direction, but yes, implementation wise we should also avoid perf impact. I see there have to more c

[hudi] 02/11: [HUDI-3835] Add UT for delete in java client (#5270)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 7bc940c7e4fb26dfa850689161dec754005b406b Author: 董可伦 AuthorDate: Sat Apr 16 03:03:48 2022 +0800 [HUDI-3835]

[hudi] 09/11: [HUDI-3903] Fix NoClassDefFoundError with Kafka Connect bundle (#5353)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 54753a5a3089bffb9922f1182ee59bd6b68db632 Author: Y Ethan Guo AuthorDate: Mon Apr 18 18:17:53 2022 -0700 [HU

[hudi] branch release-0.11.0 updated (fbc6595b34 -> c0cadbca65)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a change to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git from fbc6595b34 [HOTFIX] add missing license (#5322) new d886e9eb8d [MINOR] Removing invalid code to close par

[hudi] 05/11: [HUDI-3886] Adding default null for some of the fields in col stats in MDT schema (#5329)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 06adf27051da79fafe132acff004a365692a760a Author: Sivabalan Narayanan AuthorDate: Mon Apr 18 10:37:03 2022 -0400

[hudi] 10/11: [HUDI-3899] Drop index to delete pending index instants from timeline if applicable (#5342)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 46deea05c402e986395875b35f8bdab6cd8bbda5 Author: Sagar Sumit AuthorDate: Tue Apr 19 07:58:46 2022 +0530 [HU

[hudi] 11/11: [HUDI-3894] Fix gcp bundle to include HBase dependencies and shading (#5349)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit c0cadbca6560ace603acc056af5597befa6be6e7 Author: Raymond Xu <2701446+xushi...@users.noreply.github.com> AuthorDat

[hudi] 08/11: [HUDI-3894] Fix datahub to include HBase dependencies and shading (#5338)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 52cb67bf48411b4c8e7da8df12ed00b72d4e186c Author: Y Ethan Guo AuthorDate: Mon Apr 18 16:20:50 2022 -0700 [HU

[hudi] 04/11: Fixing async clustering job test in TestHoodieDeltaStreamer (#5317)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 23b0dfc337737e2d1e5b792f999c311df62f9498 Author: Sivabalan Narayanan AuthorDate: Mon Apr 18 08:08:33 2022 -0400

[hudi] 07/11: [HUDI-3895] Fixing file-partitioning seq for base-file only views to make sure we bucket the files efficiently (#5337)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit bdc0f9336a1fcd86d8fd216124ae4f7f7a6ea64b Author: Alexey Kudinkin AuthorDate: Mon Apr 18 13:06:52 2022 -0700

[hudi] 03/11: [MINOR] Fix typos in log4j-surefire.properties (#5212)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 4a4407288a10feac3b8e781c03347424be84eba5 Author: 董可伦 AuthorDate: Sat Apr 16 04:33:37 2022 +0800 [MINOR] Fix

[hudi] 01/11: [MINOR] Removing invalid code to close parquet reader iterator (#5182)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit d886e9eb8df9d74d2b8a00a98e851704af183364 Author: Sivabalan Narayanan AuthorDate: Fri Apr 15 14:50:07 2022 -0400

[hudi] 06/11: [HUDI-3707] Fix target schema handling in HoodieSparkUtils while creating RDD (#5347)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch release-0.11.0 in repository https://gitbox.apache.org/repos/asf/hudi.git commit d979feee3483debe7c8aa0f3d25ebed18e41bcfe Author: Sagar Sumit AuthorDate: Mon Apr 18 23:04:04 2022 +0530 [HU

[GitHub] [hudi] danny0405 opened a new pull request, #5360: [HUDI-3917] Flink write task hangs if last checkpoint has no data input

2022-04-19 Thread GitBox
danny0405 opened a new pull request, #5360: URL: https://github.com/apache/hudi/pull/5360 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpo

[jira] [Updated] (HUDI-3917) Flink write task hangs if last checkpoint has no data input

2022-04-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3917: - Labels: pull-request-available (was: ) > Flink write task hangs if last checkpoint has no data in

[GitHub] [hudi] hudi-bot commented on pull request #5360: [HUDI-3917] Flink write task hangs if last checkpoint has no data input

2022-04-19 Thread GitBox
hudi-bot commented on PR #5360: URL: https://github.com/apache/hudi/pull/5360#issuecomment-1102486049 ## CI report: * 3b69640fb8bbd66767d32cccf67ba42345db2f36 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] lowmmrfeeder commented on issue #5189: [SUPPORT] Multiple chaining of hudi tables via incremental source results in duplicate partition meta column

2022-04-19 Thread GitBox
lowmmrfeeder commented on issue #5189: URL: https://github.com/apache/hudi/issues/5189#issuecomment-1102487837 @harsh1231 I am not sure of this, we aren't using columnname `_hoodie_partition_path` anywhere explicitly. My guess is deltastreamer adds this column `_hoodie_partition_path` in

[GitHub] [hudi] hudi-bot commented on pull request #5360: [HUDI-3917] Flink write task hangs if last checkpoint has no data input

2022-04-19 Thread GitBox
hudi-bot commented on PR #5360: URL: https://github.com/apache/hudi/pull/5360#issuecomment-1102488922 ## CI report: * 3b69640fb8bbd66767d32cccf67ba42345db2f36 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8140

[hudi] branch spark-perf-patch-for-0.11rc3 created (now 60e703564d)

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a change to branch spark-perf-patch-for-0.11rc3 in repository https://gitbox.apache.org/repos/asf/hudi.git at 60e703564d apply spark perf patch This branch includes the following new commits: new 60e

[hudi] 01/01: apply spark perf patch

2022-04-19 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch spark-perf-patch-for-0.11rc3 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 60e703564daf2407cba5bab9cc22fa4977e31a46 Author: Raymond Xu AuthorDate: Tue Apr 19 18:48:34 2022 +

[GitHub] [hudi] hudi-bot commented on pull request #5359: [HUDI-3961] New Key Generator Option: NanoidKeyGenerator

2022-04-19 Thread GitBox
hudi-bot commented on PR #5359: URL: https://github.com/apache/hudi/pull/5359#issuecomment-1102509627 ## CI report: * 287a959ff3e828c539422a7450da09e023028486 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=813

[GitHub] [hudi] codope commented on issue #5358: [SUPPORT] read hudi cow table with spark, throw exception: File does not exist

2022-04-19 Thread GitBox
codope commented on issue #5358: URL: https://github.com/apache/hudi/issues/5358#issuecomment-1102561993 > read data with spark when clean was happened throw exception This means Hudi is attempting to read a version of the parquet file which no longer exists. https://github.com/apa

[GitHub] [hudi] hudi-bot commented on pull request #5360: [HUDI-3917] Flink write task hangs if last checkpoint has no data input

2022-04-19 Thread GitBox
hudi-bot commented on PR #5360: URL: https://github.com/apache/hudi/pull/5360#issuecomment-1102570581 ## CI report: * 3b69640fb8bbd66767d32cccf67ba42345db2f36 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8140

[GitHub] [hudi] ksrihari93 opened a new issue, #5361: [SUPPORT] Schema Integration between Stream and batch pipelines using Deltastreamer JDBC

2022-04-19 Thread GitBox
ksrihari93 opened a new issue, #5361: URL: https://github.com/apache/hudi/issues/5361 Hi Team, We are running cdc pipelines with Mysql, Kafka Connect,Debezium and Hudi ( Delta streamer).And Schema n schema registry For streaming pipelines, Hudi Job is running fine with proper schema i

[GitHub] [hudi] nsivabalan commented on issue #5195: [SUPPORT] JDBCServer hudi merge into cmd fail

2022-04-19 Thread GitBox
nsivabalan commented on issue #5195: URL: https://github.com/apache/hudi/issues/5195#issuecomment-1102593032 @watermelon12138 : gentle ping. can you respond when you can -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

2022-04-19 Thread GitBox
hudi-bot commented on PR #5352: URL: https://github.com/apache/hudi/pull/5352#issuecomment-1102599527 ## CI report: * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN * b975d3295cd534ac04c7ee58bb7961fd9971597d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] nsivabalan closed issue #5313: [SUPPORT] Do we have plan to support java reader for Hudi?

2022-04-19 Thread GitBox
nsivabalan closed issue #5313: [SUPPORT] Do we have plan to support java reader for Hudi? URL: https://github.com/apache/hudi/issues/5313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] nsivabalan commented on issue #5313: [SUPPORT] Do we have plan to support java reader for Hudi?

2022-04-19 Thread GitBox
nsivabalan commented on issue #5313: URL: https://github.com/apache/hudi/issues/5313#issuecomment-1102602829 thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [hudi] nsivabalan commented on issue #5298: [SUPPORT] File is deleted during inline compaction on MOR table causing subsequent FileNotFoundException on a reader

2022-04-19 Thread GitBox
nsivabalan commented on issue #5298: URL: https://github.com/apache/hudi/issues/5298#issuecomment-1102609068 its not related to metadata table as such. essentially, the actual data files as part of the compaction commit could be different from what is found in compaction commit metadata. So

[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

2022-04-19 Thread GitBox
hudi-bot commented on PR #5352: URL: https://github.com/apache/hudi/pull/5352#issuecomment-1102654401 ## CI report: * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN * b975d3295cd534ac04c7ee58bb7961fd9971597d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[jira] [Created] (HUDI-3918) Improve flink bulk_insert performace for partitioned table

2022-04-19 Thread konwu (Jira)
konwu created HUDI-3918: --- Summary: Improve flink bulk_insert performace for partitioned table Key: HUDI-3918 URL: https://issues.apache.org/jira/browse/HUDI-3918 Project: Apache Hudi Issue Type: Improv

[jira] [Updated] (HUDI-3079) Docs for Flink 0.10.0 new features

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3079: - Sprint: Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-12

[jira] [Updated] (HUDI-3207) Hudi Trino connector PR review

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3207: - Sprint: Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31, Hudi-Sprint-Feb-7,

[jira] [Updated] (HUDI-3902) Fallback to HadoopFsRelation for non-sophisticated COW use-cases

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3902: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Fallback to HadoopFsRelation f

[jira] [Updated] (HUDI-3806) Improve HoodieBloomIndex using bloom_filter and col_stats in MDT

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3806: - Sprint: Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-05, Hudi-Sprint-

[jira] [Updated] (HUDI-3013) Docs for Presto and Hudi

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3013: - Sprint: Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05, Hudi-Sprint-Apr-12

[jira] [Updated] (HUDI-3074) Docs for Z-order

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3074: - Sprint: Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31,

[jira] [Updated] (HUDI-3873) 0.11 release blog

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3873: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > 0.11 release blog > --

[jira] [Updated] (HUDI-3752) Update website content based on 0.11 new features

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3752: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Update website content based o

[jira] [Updated] (HUDI-3896) Support Spark optimizations for `HadoopFsRelation`

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3896: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Support Spark optimizations fo

[jira] [Updated] (HUDI-2695) Documentation

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2695: - Sprint: Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31,

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > upgrade spring cve-2022-22965

[jira] [Updated] (HUDI-1605) Add more documentation around archival process and configs

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1605: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Add more documentation around

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1602: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Corrupted Avro schema extracte

[jira] [Updated] (HUDI-3075) Docs for Debezium source

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3075: - Sprint: Hudi-Sprint-Jan-3, Hudi-Sprint-Jan-10, Hudi-Sprint-Jan-18, Hudi-Sprint-Jan-24, Hudi-Sprint-Jan-31,

[jira] [Updated] (HUDI-3036) Enhance Cleaner Docs

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3036: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Enhance Cleaner Docs > ---

[jira] [Updated] (HUDI-3906) Prepare RC3 and run basic tests

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3906: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Prepare RC3 and run basic test

[jira] [Updated] (HUDI-3884) Inspect why archival stops at first savepoint. Add support if possible

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3884: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Inspect why archival stops at

[jira] [Updated] (HUDI-3905) Update Hudi Sink quick start with S3 setup

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3905: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Update Hudi Sink quick start w

[jira] [Updated] (HUDI-3749) Try out 0.11 hudi w/ EMR spark

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3749: - Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Mar-22, Hudi-Sprint-

[jira] [Updated] (HUDI-3911) Async indexer blog/doc for 0.11 release

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3911: - Sprint: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Apr-12) > Async indexer blog/doc for 0.1

[jira] [Updated] (HUDI-2459) Support async compaction for metadata table

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2459: - Sprint: (was: Hudi-Sprint-Apr-19) > Support async compaction for metadata table > --

[GitHub] [hudi] wxplovecc opened a new pull request, #5362: [HUDI-3918] Improve flink bulk_insert perform for partitioned table

2022-04-19 Thread GitBox
wxplovecc opened a new pull request, #5362: URL: https://github.com/apache/hudi/pull/5362 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpo

[jira] [Updated] (HUDI-3918) Improve flink bulk_insert performace for partitioned table

2022-04-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3918: - Labels: pull-request-available (was: ) > Improve flink bulk_insert performace for partitioned tab

[GitHub] [hudi] hudi-bot commented on pull request #5362: [HUDI-3918] Improve flink bulk_insert perform for partitioned table

2022-04-19 Thread GitBox
hudi-bot commented on PR #5362: URL: https://github.com/apache/hudi/pull/5362#issuecomment-1102693876 ## CI report: * fabdbf1c87824b1d39aa211f3eaea867ff96a2cc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #5362: [HUDI-3918] Improve flink bulk_insert perform for partitioned table

2022-04-19 Thread GitBox
hudi-bot commented on PR #5362: URL: https://github.com/apache/hudi/pull/5362#issuecomment-1102697821 ## CI report: * fabdbf1c87824b1d39aa211f3eaea867ff96a2cc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=8142

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Fix Version/s: 0.12.0 (was: 0.11.0) > upgrade spring cve-2022-22965 > -

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Fix Version/s: 0.11.1 (was: 0.12.0) > upgrade spring cve-2022-22965 > -

[jira] [Updated] (HUDI-3819) upgrade spring cve-2022-22965

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3819: - Sprint: Hudi-Sprint-Apr-12 (was: Hudi-Sprint-Apr-12, Hudi-Sprint-Apr-19) > upgrade spring cve-2022-22965

[jira] [Updated] (HUDI-2460) Async cleaning with metadata table

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2460: - Sprint: (was: Hudi-Sprint-Apr-19) > Async cleaning with metadata table > ---

[jira] [Updated] (HUDI-3917) Flink write task hangs if last checkpoint has no data input

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3917: - Sprint: Hudi-Sprint-Apr-19 > Flink write task hangs if last checkpoint has no data input > ---

[jira] [Closed] (HUDI-3842) Add non partitioned tests to integ tests

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3842. Resolution: Fixed > Add non partitioned tests to integ tests > > >

[jira] [Updated] (HUDI-3842) Add non partitioned tests to integ tests

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3842: - Fix Version/s: 0.11.0 (was: 0.12.0) > Add non partitioned tests to integ tests > --

[jira] [Updated] (HUDI-2736) Redundant metadata table initialization by the metadata writer

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2736: - Sprint: (was: Hudi-Sprint-Apr-19) > Redundant metadata table initialization by the metadata writer > ---

[jira] [Updated] (HUDI-3317) Partition specific pointed lookup/reading strategy for metadata table

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3317: - Sprint: (was: Hudi-Sprint-Apr-19) > Partition specific pointed lookup/reading strategy for metadata tabl

[jira] [Updated] (HUDI-3288) Partition specific compaction strategy for the metadata table

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3288: - Sprint: (was: Hudi-Sprint-Apr-19) > Partition specific compaction strategy for the metadata table >

[GitHub] [hudi] hudi-bot commented on pull request #5352: [HUDI-3902] Fallback to `HadoopFsRelation` in cases non-involving Schema Evolution

2022-04-19 Thread GitBox
hudi-bot commented on PR #5352: URL: https://github.com/apache/hudi/pull/5352#issuecomment-1102743626 ## CI report: * 57f622f643f7c623129636f8e5000ffe014b0c0b UNKNOWN * cabf2cf679534f15b22f3b5daa77a75987667fa5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[jira] [Updated] (HUDI-1976) Upgrade hive, jackson, log4j, hadoop to remove vulnerability

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1976: - Sprint: Hudi-Sprint-Apr-19 > Upgrade hive, jackson, log4j, hadoop to remove vulnerability > --

[jira] [Updated] (HUDI-2955) Upgrade Hadoop to 3.3.x

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2955: - Sprint: Hudi-Sprint-Feb-14, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05

[jira] [Updated] (HUDI-3453) Metadata table throws NPE when scheduling compaction plan

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3453: - Sprint: Hudi-Sprint-Mar-07, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05

[jira] [Closed] (HUDI-3886) Fix col stats filename to have default null value

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3886. Resolution: Fixed > Fix col stats filename to have default null value >

[jira] [Updated] (HUDI-3783) Fix HoodieTestTable harness to also properly validate Column Stats

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3783: - Sprint: Hudi-Sprint-Apr-19 > Fix HoodieTestTable harness to also properly validate Column Stats >

[jira] [Updated] (HUDI-3886) Fix col stats filename to have default null value

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3886: - Fix Version/s: 0.11.0 (was: 0.12.0) > Fix col stats filename to have default null v

[jira] [Updated] (HUDI-2473) Fix compaction action type in commit metadata

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2473: - Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Mar-22) > Fix compaction action type in

[jira] [Updated] (HUDI-3054) Fix flaky TestHoodieClientMultiWriter. testHoodieClientBasicMultiWriter

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3054: - Sprint: Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-19 (was: Hudi-Sprint-Mar-22) > Fix flaky TestHoodieClientMult

[jira] [Updated] (HUDI-3668) Fix failing unit tests in hudi-integ-test

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3668: - Sprint: Hudi-Sprint-Apr-19 > Fix failing unit tests in hudi-integ-test > -

[jira] [Closed] (HUDI-3876) Fix fetching partitions from glue sync to make N requests until token exhausts

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-3876. Resolution: Fixed > Fix fetching partitions from glue sync to make N requests until token exhausts > ---

[jira] [Updated] (HUDI-3735) TestHoodieSparkMergeOnReadTableRollback is flaky

2022-04-19 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3735: - Sprint: Hudi-Sprint-Apr-19 > TestHoodieSparkMergeOnReadTableRollback is flaky > -

  1   2   3   >