[GitHub] [hudi] leesf commented on a change in pull request #4222: [HUDI-2849] improve SparkUI job description for write path
leesf commented on a change in pull request #4222: URL: https://github.com/apache/hudi/pull/4222#discussion_r764625760 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -441,9 +441,10 @@ public void refreshTimeline() throws IOException { return null; } +jssc.setJobGroup(this.getClass().getSimpleName(), "Checking if input is empty"); if ((!avroRDDOptional.isPresent()) || (avroRDDOptional.get().isEmpty())) { LOG.info("No new data, perform empty commit."); - return Pair.of(schemaProvider, Pair.of(checkpointStr, jssc.emptyRDD())); + return Pair.of(schemaProvider, Pair.of(checkpointStr, null)); Review comment: why do we need modify this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] leesf closed pull request #4193: [HUDI-2915] Fix field not found error for sparksql
leesf closed pull request #4193: URL: https://github.com/apache/hudi/pull/4193 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4246: [MINOR] Update DOAP with 0.10.0 Release
hudi-bot removed a comment on pull request #4246: URL: https://github.com/apache/hudi/pull/4246#issuecomment-988561713 ## CI report: * b40dde1704cd0c69ec1981bfd411e64bd46831a4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4087) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4246: [MINOR] Update DOAP with 0.10.0 Release
hudi-bot commented on pull request #4246: URL: https://github.com/apache/hudi/pull/4246#issuecomment-988584778 ## CI report: * b40dde1704cd0c69ec1981bfd411e64bd46831a4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4087) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YuweiXiao commented on a change in pull request #4222: [HUDI-2849] improve SparkUI job description for write path
YuweiXiao commented on a change in pull request #4222: URL: https://github.com/apache/hudi/pull/4222#discussion_r764627374 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -441,9 +441,10 @@ public void refreshTimeline() throws IOException { return null; } +jssc.setJobGroup(this.getClass().getSimpleName(), "Checking if input is empty"); if ((!avroRDDOptional.isPresent()) || (avroRDDOptional.get().isEmpty())) { LOG.info("No new data, perform empty commit."); - return Pair.of(schemaProvider, Pair.of(checkpointStr, jssc.emptyRDD())); + return Pair.of(schemaProvider, Pair.of(checkpointStr, null)); Review comment: It saves one call to isEmpty() in `DeltaSync::writeToSink`, which could further eliminate one spark job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] kywe665 opened a new pull request #4250: [HUDI-2956] - Updating write docs for deletes and full write path description
kywe665 opened a new pull request #4250: URL: https://github.com/apache/hudi/pull/4250 ## What is the purpose of the pull request Added more details for deletes and described high level full write path as described in this deep dive: https://www.youtube.com/watch?v=N2eDfU_rQ_U ## Verify this pull request Docs change only ## Committer checklist - [X] Has a corresponding JIRA in PR title & commit - [X] Commit message is descriptive of the change - [X] CI is green - [X] Necessary doc changes done or have another open PR - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2956) Improve Write docs
[ https://issues.apache.org/jira/browse/HUDI-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2956: - Labels: pull-request-available (was: ) > Improve Write docs > -- > > Key: HUDI-2956 > URL: https://issues.apache.org/jira/browse/HUDI-2956 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > add each substep of writing from > https://docs.google.com/presentation/d/1GpJ27IVtefqLbcGMvVDKDfoNMHv2-CFNc2-0BoAh3Ik/edit#slide=id.g8d35d881f3_0_58 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2956) Improve Write docs
[ https://issues.apache.org/jira/browse/HUDI-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weller updated HUDI-2956: -- Status: Patch Available (was: In Progress) > Improve Write docs > -- > > Key: HUDI-2956 > URL: https://issues.apache.org/jira/browse/HUDI-2956 > Project: Apache Hudi > Issue Type: Sub-task >Reporter: Kyle Weller >Assignee: Kyle Weller >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > > add each substep of writing from > https://docs.google.com/presentation/d/1GpJ27IVtefqLbcGMvVDKDfoNMHv2-CFNc2-0BoAh3Ik/edit#slide=id.g8d35d881f3_0_58 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] guanlisheng commented on issue #4055: [SUPPORT] Hudi with SqlQueryBasedTransformer fails-> spark error exit 134 or exit 143 in "isEmpty at DeltaSync.java:344" : Container from a bad n
guanlisheng commented on issue #4055: URL: https://github.com/apache/hudi/issues/4055#issuecomment-988592379 Hi there, I have an identical issue when enabling my customized transformer class in Hudi 7.0 on EMR. the transformer class is performing `mapPartitions` operation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4245: [MINOR] remove unuse construction method
hudi-bot removed a comment on pull request #4245: URL: https://github.com/apache/hudi/pull/4245#issuecomment-988581448 ## CI report: * 6f4e9f5fd7387cc3ec4dfa8d7f7a83a3abcbd0c0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4088) * 2b1b0fcc6c35bd83846bf0914babc2199165f5c5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4245: [MINOR] remove unuse construction method
hudi-bot commented on pull request #4245: URL: https://github.com/apache/hudi/pull/4245#issuecomment-988610403 ## CI report: * 6f4e9f5fd7387cc3ec4dfa8d7f7a83a3abcbd0c0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4088) * 2b1b0fcc6c35bd83846bf0914babc2199165f5c5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4089) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2957) Shade kryo jar for flink bundle jar
Danny Chen created HUDI-2957: Summary: Shade kryo jar for flink bundle jar Key: HUDI-2957 URL: https://issues.apache.org/jira/browse/HUDI-2957 Project: Apache Hudi Issue Type: Improvement Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.11.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4245: [MINOR] remove unuse construction method
hudi-bot removed a comment on pull request #4245: URL: https://github.com/apache/hudi/pull/4245#issuecomment-988610403 ## CI report: * 6f4e9f5fd7387cc3ec4dfa8d7f7a83a3abcbd0c0 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4088) * 2b1b0fcc6c35bd83846bf0914babc2199165f5c5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4089) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4245: [MINOR] remove unuse construction method
hudi-bot commented on pull request #4245: URL: https://github.com/apache/hudi/pull/4245#issuecomment-988641881 ## CI report: * 2b1b0fcc6c35bd83846bf0914babc2199165f5c5 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4089) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Limess commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoint
Limess commented on issue #4146: URL: https://github.com/apache/hudi/issues/4146#issuecomment-988642069 > @Limess : So, is the expectation that, if you set checkpoint = 0, deltastreamer should start from scratch as though we are starting deltastreamer for the first time ? Yes that's the expectation - this is also what currently happens, but where it doesn't match my expectations is that the subsequent commit is skipped so nothing is actually writen. Intuitively I'd expect that if the checkpoint < the commit timestamp, Hudi should always commit. > if you wish to override the checkpoint, probably you need to set --initial-checkpoint-provider. > > https://github.com/apache/hudi/blob/e8473b9a2b5bf0ad9370377899f6a7ea4d1ceba1/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L357 I did look into this and found it hard to understand/use, neither existing implementations match this use case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 opened a new pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
danny0405 opened a new pull request #4251: URL: https://github.com/apache/hudi/pull/4251 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2957) Shade kryo jar for flink bundle jar
[ https://issues.apache.org/jira/browse/HUDI-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2957: - Labels: pull-request-available (was: ) > Shade kryo jar for flink bundle jar > --- > > Key: HUDI-2957 > URL: https://issues.apache.org/jira/browse/HUDI-2957 > Project: Apache Hudi > Issue Type: Improvement > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot commented on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988657584 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot removed a comment on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988657584 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot commented on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988659621 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #4246: [MINOR] Update DOAP with 0.10.0 Release
danny0405 commented on pull request #4246: URL: https://github.com/apache/hudi/pull/4246#issuecomment-988661990 The test failure should not be caused by this patch, so i would just merge it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 merged pull request #4246: [MINOR] Update DOAP with 0.10.0 Release
danny0405 merged pull request #4246: URL: https://github.com/apache/hudi/pull/4246 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[hudi] branch master updated (c9e18d1 -> c56d93e)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from c9e18d1 [HUDI-2942] add error message log in HoodieCombineHiveInputFormat (#4224) add c56d93e [MINOR] Update DOAP with 0.10.0 Release (#4246) No new revisions were added by this update. Summary of changes: doap_HUDI.rdf | 5 + 1 file changed, 5 insertions(+)
[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot removed a comment on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988659621 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot commented on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988694462 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2958) Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert to insert data which contains decimal Type.
[ https://issues.apache.org/jira/browse/HUDI-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tao meng updated HUDI-2958: --- Summary: Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert to insert data which contains decimal Type. (was: Automatically set spark.sql.parquet.writelegacyformat. When using bulkinsert to insert data will contains decimal Type.) > Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert > to insert data which contains decimal Type. > > > Key: HUDI-2958 > URL: https://issues.apache.org/jira/browse/HUDI-2958 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: tao meng >Priority: Minor > Fix For: 0.11.0 > > > Now by default ParquetWriteSupport will write DecimalType to parquet as > int32/int64 when the scale of decimalType < Decimal.MAX_LONG_DIGITS(), > but AvroParquetReader which used by HoodieParquetReader cannot support read > int32/int64 as DecimalType. this will lead follow error > Caused by: java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41) > at > org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75) > .. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2958) Automatically set spark.sql.parquet.writelegacyformat. When using bulkinsert to insert data will contains decimal Type.
tao meng created HUDI-2958: -- Summary: Automatically set spark.sql.parquet.writelegacyformat. When using bulkinsert to insert data will contains decimal Type. Key: HUDI-2958 URL: https://issues.apache.org/jira/browse/HUDI-2958 Project: Apache Hudi Issue Type: Improvement Components: Spark Integration Reporter: tao meng Fix For: 0.11.0 Now by default ParquetWriteSupport will write DecimalType to parquet as int32/int64 when the scale of decimalType < Decimal.MAX_LONG_DIGITS(), but AvroParquetReader which used by HoodieParquetReader cannot support read int32/int64 as DecimalType. this will lead follow error Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41) at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75) .. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2959) Fix the thread leak of cleaning service
Danny Chen created HUDI-2959: Summary: Fix the thread leak of cleaning service Key: HUDI-2959 URL: https://issues.apache.org/jira/browse/HUDI-2959 Project: Apache Hudi Issue Type: Bug Components: Flink Integration Reporter: Danny Chen Assignee: Danny Chen Fix For: 0.11.0 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] danny0405 opened a new pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
danny0405 opened a new pull request #4252: URL: https://github.com/apache/hudi/pull/4252 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
danny0405 commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988774462 @vinothchandar , can you take a look, thanks so much ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2959) Fix the thread leak of cleaning service
[ https://issues.apache.org/jira/browse/HUDI-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2959: - Labels: pull-request-available (was: ) > Fix the thread leak of cleaning service > --- > > Key: HUDI-2959 > URL: https://issues.apache.org/jira/browse/HUDI-2959 > Project: Apache Hudi > Issue Type: Bug > Components: Flink Integration >Reporter: Danny Chen >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] danny0405 commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
danny0405 commented on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988775610 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot removed a comment on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988694462 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988776087 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot commented on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988776048 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot removed a comment on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988776087 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-98863 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] xiarixiaoyao opened a new pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
xiarixiaoyao opened a new pull request #4253: URL: https://github.com/apache/hudi/pull/4253 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request Now by default ParquetWriteSupport will write DecimalType to parquet as int32/int64 when the scale of decimalType < Decimal.MAX_LONG_DIGITS(), but AvroParquetReader which used by HoodieParquetReader cannot support read int32/int64 as DecimalType. this will lead follow error Caused by: java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41) at org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75) .. we fixed this problem by auto Automatically set spark.sql.parquet.writelegacyformat. ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2958) Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert to insert data which contains decimal Type.
[ https://issues.apache.org/jira/browse/HUDI-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2958: - Labels: pull-request-available (was: ) > Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert > to insert data which contains decimal Type. > > > Key: HUDI-2958 > URL: https://issues.apache.org/jira/browse/HUDI-2958 > Project: Apache Hudi > Issue Type: Improvement > Components: Spark Integration >Reporter: tao meng >Priority: Minor > Labels: pull-request-available > Fix For: 0.11.0 > > > Now by default ParquetWriteSupport will write DecimalType to parquet as > int32/int64 when the scale of decimalType < Decimal.MAX_LONG_DIGITS(), > but AvroParquetReader which used by HoodieParquetReader cannot support read > int32/int64 as DecimalType. this will lead follow error > Caused by: java.lang.UnsupportedOperationException: > org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary > at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41) > at > org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75) > .. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988783594 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093) * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot removed a comment on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-98863 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988787463 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093) * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-988787494 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot removed a comment on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988783594 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093) * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot removed a comment on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-988787494 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-988789343 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yuzhaojing opened a new pull request #4254: [HUDI-2537] Schedule Flink compaction in service
yuzhaojing opened a new pull request #4254: URL: https://github.com/apache/hudi/pull/4254 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4254: [HUDI-2537] Schedule Flink compaction in service
hudi-bot commented on pull request #4254: URL: https://github.com/apache/hudi/pull/4254#issuecomment-988805075 ## CI report: * 59cdd6413be9d029f175e06e12db7893f75e7af7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4254: [HUDI-2537] Schedule Flink compaction in service
hudi-bot removed a comment on pull request #4254: URL: https://github.com/apache/hudi/pull/4254#issuecomment-988805075 ## CI report: * 59cdd6413be9d029f175e06e12db7893f75e7af7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4254: [HUDI-2537] Schedule Flink compaction in service
hudi-bot commented on pull request #4254: URL: https://github.com/apache/hudi/pull/4254#issuecomment-988807064 ## CI report: * 59cdd6413be9d029f175e06e12db7893f75e7af7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4095) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
danny0405 commented on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988814010 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot removed a comment on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988776048 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot commented on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988815109 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4096) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot removed a comment on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988787463 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093) * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988825206 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093) * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot commented on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-988825224 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType
hudi-bot removed a comment on pull request #4253: URL: https://github.com/apache/hudi/pull/4253#issuecomment-988789343 ## CI report: * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot removed a comment on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-978211197 ## CI report: * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot commented on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988827020 ## CI report: * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697) * 72ea77955da505b679945dc92ea0dd2d597bcedf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot commented on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988828994 ## CI report: * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697) * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot removed a comment on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988827020 ## CI report: * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697) * 72ea77955da505b679945dc92ea0dd2d597bcedf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4254: [HUDI-2537] Schedule Flink compaction in service
hudi-bot removed a comment on pull request #4254: URL: https://github.com/apache/hudi/pull/4254#issuecomment-988807064 ## CI report: * 59cdd6413be9d029f175e06e12db7893f75e7af7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4095) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4254: [HUDI-2537] Schedule Flink compaction in service
hudi-bot commented on pull request #4254: URL: https://github.com/apache/hudi/pull/4254#issuecomment-988849669 ## CI report: * 59cdd6413be9d029f175e06e12db7893f75e7af7 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4095) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot removed a comment on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988815109 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4096) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar
hudi-bot commented on pull request #4251: URL: https://github.com/apache/hudi/pull/4251#issuecomment-988861254 ## CI report: * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092) Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4096) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally
nsivabalan closed issue #2934: URL: https://github.com/apache/hudi/issues/2934 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally
nsivabalan commented on issue #2934: URL: https://github.com/apache/hudi/issues/2934#issuecomment-90486 Will close out the ticket as this is expected with interplays between archival and incremental queries. and since we have a patch addressing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #3826: Deltastreamer not getting auto triggered in continuous mode
nsivabalan closed issue #3826: URL: https://github.com/apache/hudi/issues/3826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3826: Deltastreamer not getting auto triggered in continuous mode
nsivabalan commented on issue #3826: URL: https://github.com/apache/hudi/issues/3826#issuecomment-92722 Can you please respond. This is very common use-case and many folks in the community have been running in continuous mode. So, some env specific or config issue. Closing it for now. Feel free to re-open if need be. happy to help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot removed a comment on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988825206 ## CI report: * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093) * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-92409 ## CI report: * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoi
nsivabalan commented on issue #4146: URL: https://github.com/apache/hudi/issues/4146#issuecomment-988891080 gotcha. guess the way you set the checkpoint should work based on this code block ``` if (cfg.checkpoint != null && (StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY)) || !cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY { resumeCheckpointStr = Option.of(cfg.checkpoint); ``` Can you enable debug logs and post it here. In the mean time, I will try to reproduce this locally and will post an update here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoi
nsivabalan commented on issue #4146: URL: https://github.com/apache/hudi/issues/4146#issuecomment-988892449 Can you try checkpoint = "val=0" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2960) create hudi table may cause memory leak in spark thrift server
suheng.cloud created HUDI-2960: -- Summary: create hudi table may cause memory leak in spark thrift server Key: HUDI-2960 URL: https://issues.apache.org/jira/browse/HUDI-2960 Project: Apache Hudi Issue Type: Bug Components: Spark Integration Affects Versions: 0.10.0 Reporter: suheng.cloud Hi, community I currently try to use spark-hudi integration in spark-thrift-server, and after test create hudi table for a while, I found it would finally result in META-SPACE OOM(in my case, jvm option -XX:MaxMetaspaceSize=256m assigned). After track the source, I found that every time a CreateHoodieTableCommand performed, `HiveClientUtils.newClientForMetadata` will be invoked, thus a IsolatedClientLoader will be created, in my scene, the OOM will occured after about 10 create statement executed. Why not use `sessionState.catalog.externalCatalog.asInstanceOf[ExternalCatalogWithListener].unwrapped.asInstanceOf[HiveExternalCatalog].client ` instead ? Does it has anything side effect? env: hudi master/spark-3.1.2/hive-2.3.6 Thanks. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] Limess commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoint
Limess commented on issue #4146: URL: https://github.com/apache/hudi/issues/4146#issuecomment-988898544 > Can you try checkpoint = "val=0" To clarify, you're asking me to try `--checkpoint "val=0"` using the Deltastreamer CLI? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Limess edited a comment on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit check
Limess edited a comment on issue #4146: URL: https://github.com/apache/hudi/issues/4146#issuecomment-988898544 > Can you try checkpoint = "val=0" To clarify, you're asking me to try `--checkpoint "val=0"` using the Deltastreamer CLI? Or `--checkpoint 0`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Limess edited a comment on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit check
Limess edited a comment on issue #4146: URL: https://github.com/apache/hudi/issues/4146#issuecomment-988642069 > @Limess : So, is the expectation that, if you set checkpoint = 0, deltastreamer should start from scratch as though we are starting deltastreamer for the first time ? Yes that's the expectation - this is also what currently happens, but where it doesn't match my expectations is that the subsequent commit is skipped so nothing is actually written. Intuitively I'd expect that if the checkpoint < the commit timestamp, Hudi should always commit. > if you wish to override the checkpoint, probably you need to set --initial-checkpoint-provider. > > https://github.com/apache/hudi/blob/e8473b9a2b5bf0ad9370377899f6a7ea4d1ceba1/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L357 I did look into this and found it hard to understand/use, neither existing implementations match this use case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot commented on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988899676 ## CI report: * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot removed a comment on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988828994 ## CI report: * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697) * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] danny0405 commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
danny0405 commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988906058 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988907016 ## CI report: * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4099) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot removed a comment on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-92409 ## CI report: * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot removed a comment on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988899676 ## CI report: * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot commented on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988911467 ## CI report: * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098) * a23ba86033f3215c9f57118742189ae844c6c850 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot removed a comment on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988911467 ## CI report: * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098) * a23ba86033f3215c9f57118742189ae844c6c850 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot commented on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988914045 ## CI report: * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098) * a23ba86033f3215c9f57118742189ae844c6c850 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4100) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot commented on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988947837 ## CI report: * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097) Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4099) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service
hudi-bot removed a comment on pull request #4252: URL: https://github.com/apache/hudi/pull/4252#issuecomment-988907016 ## CI report: * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4099) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoi
nsivabalan commented on issue #4146: URL: https://github.com/apache/hudi/issues/4146#issuecomment-988951531 This worked for me. First run ``` nsb$ grep "Checkpoint" /tmp/logs/log23.out 21/12/08 07:47:52 INFO DeltaSync: Checkpoint to resume from : Optional.empty 21/12/08 07:49:35 INFO DeltaSync: Checkpoint to resume from : Option{val=1638825407000} 21/12/08 07:50:35 INFO DeltaSync: Checkpoint to resume from : Option{val=1638825408000} 21/12/08 07:51:46 INFO DeltaSync: Checkpoint to resume from : Option{val=1638825413000} ``` Stopped deltastreamer and restarted with additional config --checkpoint 1638825407000 ``` nsb$ tail -f /tmp/logs/log24.out | grep "Checkpoint" 21/12/08 07:54:36 INFO DeltaSync: Checkpoint to resume from : Option{val=1638825407000} 21/12/08 07:56:16 INFO DeltaSync: Checkpoint to resume from : Option{val=1638825408000} 21/12/08 07:57:56 INFO DeltaSync: Checkpoint to resume from : Option{val=1638825413000} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot removed a comment on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988914045 ## CI report: * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098) * a23ba86033f3215c9f57118742189ae844c6c850 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4100) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x
hudi-bot commented on pull request #4020: URL: https://github.com/apache/hudi/pull/4020#issuecomment-988963081 ## CI report: * a23ba86033f3215c9f57118742189ae844c6c850 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4100) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] h7kanna commented on issue #4170: [SUPPORT] Understanding Clustering Behavior
h7kanna commented on issue #4170: URL: https://github.com/apache/hudi/issues/4170#issuecomment-988979839 I have hoodie.parquet.max.file.size=134217728 hoodie.parquet.small.file.limit=104857600 hoodie.clustering.plan.strategy.target.file.max.bytes=134217728 hoodie.clustering.plan.strategy.small.file.limit=104857600 In one of the partitions I could see 176 Mb file (more than max file limit) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] h7kanna edited a comment on issue #4170: [SUPPORT] Understanding Clustering Behavior
h7kanna edited a comment on issue #4170: URL: https://github.com/apache/hudi/issues/4170#issuecomment-988979839 I have hoodie.parquet.max.file.size=134217728 hoodie.parquet.small.file.limit=67108864 hoodie.clustering.plan.strategy.target.file.max.bytes=134217728 hoodie.clustering.plan.strategy.small.file.limit=67108864 In one of the partitions I could see 176 Mb file (more than max file limit) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket
vinothchandar commented on a change in pull request #3173: URL: https://github.com/apache/hudi/pull/3173#discussion_r764911822 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java ## @@ -122,13 +125,40 @@ public O updateLocation(O writeStatuses, HoodieEngineContext context, @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE) public abstract boolean isImplicitWithStorage(); + /** + * An index might need customized partitioner other than general upsert and insert partitioner. + */ + @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) + public Option getCustomizedPartitioner(WorkloadProfile profile, + HoodieEngineContext context, + HoodieTable table, + HoodieWriteConfig writeConfig) { +return Option.empty(); + } + + /** + * If the `getCustomizedPartitioner` returns a partitioner, it has to be true. + */ + @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) Review comment: instead of thinking of it as custom partitioner, I would prefer we introduce a notion of "storage layout". bucketing is not just an attribute of writing but storage itself. once bucketed, any writer/reader needs to respect that. ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java ## @@ -200,6 +209,48 @@ .defaultValue("true") .withDocumentation("Similar to " + BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE + ", but for simple index."); + /** + * * Bucket Index Configs * + * Bucket Index is targeted to locate the record fast by hash in big data scenarios. + * The current implementation is a basic version, so there are some constraints: + * 1. Unsupported operation: bulk insert, cluster and so on. + * 2. Bucket num change requires rewriting the partition. + * 3. Predict the table size and future data growth well to set a reasonable bucket num. + * 4. A bucket size is recommended less than 3GB and avoid bing too small. + */ + // Bucket num equals file groups num in each partition. + // Bucket num can be set according to partition size and file group size. + public static final ConfigProperty BUCKET_INDEX_NUM_BUCKETS = ConfigProperty + .key("hoodie.bucket.index.num.buckets") + .defaultValue(256) + .withDocumentation("Only applies if index type is BUCKET_INDEX. Determine the bucket num of the hudi table, " + + "and each partition is divided to N buckets."); + + public static final ConfigProperty BUCKET_INDEX_HASH_FIELD = ConfigProperty + .key("hoodie.bucket.index.hash.field") + .noDefaultValue() + .withDocumentation("Index key. It is used to index the record and find its file group. " + + "If not set, use record key field as default"); + + public static final ConfigProperty BUCKET_INDEX_HASH_FUNCTION = ConfigProperty + .key("hoodie.bucket.index.hash.function") + .defaultValue("JVMHash") Review comment: we have standard utils for hashing now, that we intend to use broadly across. Can we reuse `HashID`. Do we need the HiveHash per se? I feel we should default to something other than JVMHash. wdyt ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java ## @@ -200,6 +209,48 @@ .defaultValue("true") .withDocumentation("Similar to " + BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE + ", but for simple index."); + /** + * * Bucket Index Configs * + * Bucket Index is targeted to locate the record fast by hash in big data scenarios. + * The current implementation is a basic version, so there are some constraints: + * 1. Unsupported operation: bulk insert, cluster and so on. Review comment: Right, while bucketing helps for write perf and also join performance for UUID joins per e.g, it goes against clustering and other layout optimizations that can be useful for query performance. This is one of the reasons I did not prefer baking bucketing into the storage design. ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/BucketIdentifier.java ## @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the spec
[GitHub] [hudi] Limess commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoint
Limess commented on issue #4146: URL: https://github.com/apache/hudi/issues/4146#issuecomment-989012890 We're not running in continous mode, it looks like the above might be? We're also using DFS datasource. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2961) Async table services can race with metadata table updates
Manoj Govindassamy created HUDI-2961: Summary: Async table services can race with metadata table updates Key: HUDI-2961 URL: https://issues.apache.org/jira/browse/HUDI-2961 Project: Apache Hudi Issue Type: Task Components: Writer Core Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy Fix For: 0.11.0 Today Metadata table updates are done inline/synchronous with the data table updates. Metadata data table updates can also sometime trigger table services like compaction which are also done inline w.r.t the ongoing commit. So, updates in the metadata table are always serial. However, there can be async table services like clustering which are running in parallel with single or multiple writers and can update the metadata table in parallel with the writer commits. In the multi writer case, since we anyway have the lock provider configured metadata table updates are guarded for race. But, the lock providers are not must today for single writer + async table service deployments, leading to race in metadata table updates. Async table service like clustering can race with the metadata table compaction, and can update the wrong delta log file than the right next delta file from the compaction. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
alexeykudinkin commented on a change in pull request #4178: URL: https://github.com/apache/hudi/pull/4178#discussion_r765092480 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java ## @@ -95,8 +95,7 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, HoodieEngineContext .map(inputGroup -> runClusteringForGroupAsync(inputGroup, clusteringPlan.getStrategy().getStrategyParams(), Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false), -instantTime)) -.map(CompletableFuture::join); + instantTime)).collect(Collectors.toList()).stream().map(CompletableFuture::join); Review comment: How does this guarantee jobs will run in parallel? We simply dereference stream into list, but then still join the Futures sequentially. Instead we should use following util ``` public static CompletableFuture> allOf(@Nonnull List> futures) { return CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])) .thenApply(aVoid -> futures.stream() // NOTE: This join wouldn't block, since all the // futures are completed at this point .map(CompletableFuture::join) .collect(Collectors.toList()) ); } ``` And then invoke it like following ``` allOf( clusteringPlan.getInputGroups() .stream() .map(...) // returns `CompletableFuture` .collect(Collectors.toList()) ) .join(); ``` This would guarantee parallel execution for each individual clustering group -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel
alexeykudinkin commented on pull request #4178: URL: https://github.com/apache/hudi/pull/4178#issuecomment-989038686 Great catch @xiarixiaoyao! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-2962) Enable metadata table along with JVM local lock provider
Manoj Govindassamy created HUDI-2962: Summary: Enable metadata table along with JVM local lock provider Key: HUDI-2962 URL: https://issues.apache.org/jira/browse/HUDI-2962 Project: Apache Hudi Issue Type: Task Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy Fix For: 0.11.0 Metadata table is disabled by default in master due to https://issues.apache.org/jira/browse/HUDI-2961. For the single writer + async table services deployment model, to protect against races, we can have a fairly light weight JVM local lock provider. This mean all the writes and the table services have to be running from the single JVM, like in the case of DeltaStreamer. This doesn't cover the multi JVM writes, async table services though and a full fix for the same will be covered by HUDI-2961. For now to have the metadata table re-enabled at master, a JVM local locl provider should be sufficient. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HUDI-2963) Update configs for 0.10.0
sivabalan narayanan created HUDI-2963: - Summary: Update configs for 0.10.0 Key: HUDI-2963 URL: https://issues.apache.org/jira/browse/HUDI-2963 Project: Apache Hudi Issue Type: Improvement Components: Docs Reporter: sivabalan narayanan -- This message was sent by Atlassian Jira (v8.20.1#820001)