[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-961705568 ## CI report: * 661c2d45f1ce2eb1f973300092703a4f50c7736b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3158) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020906330 ## CI report: * 661c2d45f1ce2eb1f973300092703a4f50c7736b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3158) * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020908653 ## CI report: * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5500) * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020906330 ## CI report: * 661c2d45f1ce2eb1f973300092703a4f50c7736b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3158) * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] yihua opened a new pull request #4685: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, and upgrade HBase to 2.4.9
yihua opened a new pull request #4685: URL: https://github.com/apache/hudi/pull/4685 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)* ## Brief change log *(for example:)* - *Modify AnnotationLocation checkstyle rule in checkstyle.xml* ## Verify this pull request *(Please pick either of the following options)* This pull request is a trivial rework / code cleanup without any test coverage. *(or)* This pull request is already covered by existing tests, such as *(please describe tests)*. (or) This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end.* - *Added HoodieClientWriteTest to verify the change.* - *Manually verified the change by running a job locally.* ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4685: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, and upgrade HBase to 2.4.9
hudi-bot commented on pull request #4685: URL: https://github.com/apache/hudi/pull/4685#issuecomment-1020911910 ## CI report: * 771777e12fa1b1d5416b62ba87ae693e7776b768 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4685: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, and upgrade HBase to 2.4.9
hudi-bot commented on pull request #4685: URL: https://github.com/apache/hudi/pull/4685#issuecomment-1020914155 ## CI report: * 771777e12fa1b1d5416b62ba87ae693e7776b768 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5501) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4685: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, and upgrade HBase to 2.4.9
hudi-bot removed a comment on pull request #4685: URL: https://github.com/apache/hudi/pull/4685#issuecomment-1020911910 ## CI report: * 771777e12fa1b1d5416b62ba87ae693e7776b768 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020908653 ## CI report: * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5500) * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020915557 ## CI report: * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5500) * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on a change in pull request #3614: [HUDI-2370] Supports data encryption
liujinhui1994 commented on a change in pull request #3614: URL: https://github.com/apache/hudi/pull/3614#discussion_r791458846 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/SystemUtils.java ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.integ; + +import java.io.File; +import java.net.MalformedURLException; +import java.net.URL; +import java.security.CodeSource; +import java.security.ProtectionDomain; + +/** + * system util. + */ +public class SystemUtils { Review comment: ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020925286 ## CI report: * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5500) * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020915557 ## CI report: * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5500) * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot commented on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020935910 ## CI report: * c14b2f77c5afe3c2263809b16de4b0f2c11a8a63 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5415) * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4676: Hudi 3304 support partial update on mor table
hudi-bot removed a comment on pull request #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1020096894 ## CI report: * f5cf6cb6e7367ceb1e2f1a62189291c8487b004c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5470) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4676: Hudi 3304 support partial update on mor table
hudi-bot commented on pull request #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1020935987 ## CI report: * f5cf6cb6e7367ceb1e2f1a62189291c8487b004c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5470) * 20d796519bfe47455005a5513fda6df6af2b2901 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot removed a comment on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1018469331 ## CI report: * c14b2f77c5afe3c2263809b16de4b0f2c11a8a63 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5415) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot removed a comment on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020935910 ## CI report: * c14b2f77c5afe3c2263809b16de4b0f2c11a8a63 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5415) * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot commented on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020938252 ## CI report: * c14b2f77c5afe3c2263809b16de4b0f2c11a8a63 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5415) * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5502) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4676: Hudi 3304 support partial update on mor table
hudi-bot removed a comment on pull request #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1020935987 ## CI report: * f5cf6cb6e7367ceb1e2f1a62189291c8487b004c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5470) * 20d796519bfe47455005a5513fda6df6af2b2901 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4676: Hudi 3304 support partial update on mor table
hudi-bot commented on pull request #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1020938291 ## CI report: * f5cf6cb6e7367ceb1e2f1a62189291c8487b004c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5470) * 20d796519bfe47455005a5513fda6df6af2b2901 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5503) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4684: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency and pull in HFile related classes
hudi-bot commented on pull request #4684: URL: https://github.com/apache/hudi/pull/4684#issuecomment-1020950240 ## CI report: * 138dbb37d4fbd8cfbf3151ec6d7ca7b466f0ed17 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5499) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4684: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency and pull in HFile related classes
hudi-bot removed a comment on pull request #4684: URL: https://github.com/apache/hudi/pull/4684#issuecomment-1020901214 ## CI report: * 138dbb37d4fbd8cfbf3151ec6d7ca7b466f0ed17 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5499) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020957776 ## CI report: * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5500) * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5504) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020925286 ## CI report: * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5500) * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot removed a comment on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020938252 ## CI report: * c14b2f77c5afe3c2263809b16de4b0f2c11a8a63 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5415) * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5502) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot commented on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020965308 ## CI report: * c14b2f77c5afe3c2263809b16de4b0f2c11a8a63 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5415) * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5502) * b1f290dcf3d3c3e26e9401269f0e0cd330b7e247 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot removed a comment on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020965308 ## CI report: * c14b2f77c5afe3c2263809b16de4b0f2c11a8a63 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5415) * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5502) * b1f290dcf3d3c3e26e9401269f0e0cd330b7e247 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4685: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, and upgrade HBase to 2.4.9
hudi-bot removed a comment on pull request #4685: URL: https://github.com/apache/hudi/pull/4685#issuecomment-1020914155 ## CI report: * 771777e12fa1b1d5416b62ba87ae693e7776b768 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5501) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot commented on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020969696 ## CI report: * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5502) * b1f290dcf3d3c3e26e9401269f0e0cd330b7e247 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4685: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, and upgrade HBase to 2.4.9
hudi-bot commented on pull request #4685: URL: https://github.com/apache/hudi/pull/4685#issuecomment-1020969788 ## CI report: * 771777e12fa1b1d5416b62ba87ae693e7776b768 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5501) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stym06 commented on issue #3747: [SUPPORT] Hive Sync process stuck and unable to exit
stym06 commented on issue #3747: URL: https://github.com/apache/hudi/issues/3747#issuecomment-1020991975 How can I enable debug logs in the run_sync_tool.sh ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot removed a comment on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020969696 ## CI report: * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5502) * b1f290dcf3d3c3e26e9401269f0e0cd330b7e247 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot commented on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020996507 ## CI report: * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5502) * b1f290dcf3d3c3e26e9401269f0e0cd330b7e247 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5505) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #3614: [HUDI-2370] Supports data encryption
liujinhui1994 commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1021014765 > @liujinhui1994 some high level feedback: > > * can you put some more info in the PR description to explain the high-level functionalities we want to add here? > * parquet 1.12 is only used when hudi is built with spark 3.2. the encryption feature needs to be somehow guarded by checking the intended spark version. so we need to find a way to make the functionality only available when people using spark 3.2+ Point 1: already added Point 2: Is it possible to add a description to the InMemoryKMS annotation? Only allow spark3.2+ to use it, or if there is any other good suggestion please let me know -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1021025113 ## CI report: * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5504) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1020957776 ## CI report: * 4ca88527a8eb5087f3cd43fe5ba0e3e2fcf3a382 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5500) * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5504) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4676: Hudi 3304 support partial update on mor table
hudi-bot removed a comment on pull request #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1020938291 ## CI report: * f5cf6cb6e7367ceb1e2f1a62189291c8487b004c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5470) * 20d796519bfe47455005a5513fda6df6af2b2901 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5503) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4676: Hudi 3304 support partial update on mor table
hudi-bot commented on pull request #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1021029353 ## CI report: * 20d796519bfe47455005a5513fda6df6af2b2901 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5503) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1021025113 ## CI report: * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5504) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1021031693 ## CI report: * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5504) * c9e4ff7e7e4423f0e583904f4495f145afabcaf2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1021031693 ## CI report: * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5504) * c9e4ff7e7e4423f0e583904f4495f145afabcaf2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1021077656 ## CI report: * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5504) * c9e4ff7e7e4423f0e583904f4495f145afabcaf2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5506) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot commented on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1021089258 ## CI report: * b1f290dcf3d3c3e26e9401269f0e0cd330b7e247 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5505) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4665: [HUDI-2733] Add support for Thrift sync
hudi-bot removed a comment on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020996507 ## CI report: * b511f2a80f21204a28e3ac67211f2f1bff06d6c7 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5502) * b1f290dcf3d3c3e26e9401269f0e0cd330b7e247 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5505) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot removed a comment on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1021077656 ## CI report: * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * 1981bf744932c4a9dcffb8bdcf16432345df8ebb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5504) * c9e4ff7e7e4423f0e583904f4495f145afabcaf2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5506) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #3614: [HUDI-2370] Supports data encryption
hudi-bot commented on pull request #3614: URL: https://github.com/apache/hudi/pull/3614#issuecomment-1021126408 ## CI report: * f85aeac825205ef91e31ca4a12183c1501d12d9d UNKNOWN * a4688a962fedeeab27ce030396ce86622e6083d2 UNKNOWN * c9e4ff7e7e4423f0e583904f4495f145afabcaf2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5506) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] andykrk commented on issue #4604: [SUPPORT] Archive functionality fails
andykrk commented on issue #4604: URL: https://github.com/apache/hudi/issues/4604#issuecomment-1021132470 @nsivabalan Just additional information. I have the data in the commit that is created prior to the two I have posted here. The thing is that was the one created with our old archival settings: 'hoodie.keep.min.commits': 999, 'hoodie.keep.max.commits': 1000, 'hoodie.cleaner.commits.retained': 998, so in our case without the actual archival that we want to get. Once I'm trying to process with those settings: 'hoodie.keep.min.commits': 29, 'hoodie.keep.max.commits': 30, 'hoodie.cleaner.commits.retained': 28, we are starting to observe the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] BenjMaq commented on issue #4208: [SUPPORT] On Hudi 0.9.0 - Alter table throws java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, org
BenjMaq commented on issue #4208: URL: https://github.com/apache/hudi/issues/4208#issuecomment-1021272485 I was able to solve this really dumb issue by simply shutting down my spark app once the sql statement completed. My fault. Closing because this is not a Hudi error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] BenjMaq closed issue #4208: [SUPPORT] On Hudi 0.9.0 - Alter table throws java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.alterTable(java.lang.String, org.apach
BenjMaq closed issue #4208: URL: https://github.com/apache/hudi/issues/4208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] BenjMaq commented on issue #4131: [SUPPORT] org.apache.hudi.exception.HoodieException: The value of can not be null
BenjMaq commented on issue #4131: URL: https://github.com/apache/hudi/issues/4131#issuecomment-1021274709 Hi team, it works correctly by specifying a `preCombinefield`. We are good to close! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] BenjMaq commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
BenjMaq commented on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-1021303523 Hi again, Still trying to understand the issue :) Here's the create table statement I run to create the table using `spark-sql`: ``` CREATE TABLE IF NOT EXISTS public.test_partitioned ( id bigint, name string, start_date string, dt string ) USING hudi LOCATION 's3a:///hudi_data_lake/test_partitioned' OPTIONS ( type = 'cow', primaryKey = 'id', preCombineField = 'start_date' ) PARTITIONED BY (dt); ``` and here's the DDL when I run `show create table public.test_partitioned` in Hive cli: ``` CREATE EXTERNAL TABLE `public.test_partitioned`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` bigint, `name` string, `start_date` string) PARTITIONED BY ( `dt` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='s3a:///hudi_data_lake/test_partitioned', 'preCombineField'='start_date', 'primaryKey'='id', 'type'='cow') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a:///hudi_data_lake/test_partitioned' TBLPROPERTIES ( 'last_commit_time_sync'='20220125071309', 'spark.sql.create.version'='2.4.4-20210211', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numPartCols'='1', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"id","type":"long","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"start_date","type":"string","nullable":true,"metadata":{}},{"name":"dt","type":"string","nullable":true,"metadata":{}}]}', 'spark.sql.sources.schema.partCol.0'='dt', 'transient_lastDdlTime'='1643122431') ``` Are you maybe seeing something that looks off? I see for example ``` STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' ``` which is neither `HiveInputFormat` nor `HoodieCombineHiveInputFormat` mentioned by @nsivabalan. Is that normal? Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] BenjMaq edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
BenjMaq edited a comment on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-1021303523 Hi again, Still trying to understand the issue :) Here's the create table statement I run to create the table using `spark-sql`: ``` CREATE TABLE IF NOT EXISTS public.test_partitioned ( id bigint, name string, start_date string, dt string ) USING hudi LOCATION 's3a:///hudi_data_lake/test_partitioned' OPTIONS ( type = 'cow', primaryKey = 'id', preCombineField = 'start_date' ) PARTITIONED BY (dt); ``` and here's the DDL when I run `show create table public.test_partitioned` in Hive cli: ``` CREATE EXTERNAL TABLE `public.test_partitioned`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` bigint, `name` string, `start_date` string) PARTITIONED BY ( `dt` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='s3a:///hudi_data_lake/test_partitioned', 'preCombineField'='start_date', 'primaryKey'='id', 'type'='cow') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a:///hudi_data_lake/test_partitioned' TBLPROPERTIES ( 'last_commit_time_sync'='20220125071309', 'spark.sql.create.version'='2.4.4-20210211', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numPartCols'='1', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"id","type":"long","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"start_date","type":"string","nullable":true,"metadata":{}},{"name":"dt","type":"string","nullable":true,"metadata":{}}]}', 'spark.sql.sources.schema.partCol.0'='dt', 'transient_lastDdlTime'='1643122431') ``` Are you maybe seeing something that looks off? I see for example ``` STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' ``` which is neither `HiveInputFormat` nor `HoodieCombineHiveInputFormat` mentioned by @nsivabalan. Is that normal? As mentioned above, I've followed the steps in https://hudi.apache.org/docs/query_engine_setup#PrestoDB to be able to read from Presto and Hive. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] YannByron opened a new pull request #4686: [HUDI-3232] auto-reload active timeline incrementally
YannByron opened a new pull request #4686: URL: https://github.com/apache/hudi/pull/4686 this pr is about: 1. load the latest active timeline automatically. do not need to call `reloadActiveTimeline` or `reload` explicitly at all. 2. for the new layout version, load the latest instant incrementally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-3232) support reload timeline Incrementally
[ https://issues.apache.org/jira/browse/HUDI-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3232: - Labels: pull-request-available (was: ) > support reload timeline Incrementally > - > > Key: HUDI-3232 > URL: https://issues.apache.org/jira/browse/HUDI-3232 > Project: Apache Hudi > Issue Type: Improvement > Components: Common Core, incremental-query, writer-core >Reporter: Yann Byron >Priority: Critical > Labels: pull-request-available > Fix For: 0.11.0 > > > Recently, call `HoodieTableMetaClient.reloadActiveTimeline` many times in one > operation, and this will reload the timeline fully. > Perhaps, to support to reload in Incremental mode will increase the > performance. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] hudi-bot commented on pull request #4686: [HUDI-3232] auto-reload active timeline incrementally
hudi-bot commented on pull request #4686: URL: https://github.com/apache/hudi/pull/4686#issuecomment-1021314712 ## CI report: * 3264a340cfac2614a919d34d01eb469bcc9a52ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4686: [HUDI-3232] auto-reload active timeline incrementally
hudi-bot commented on pull request #4686: URL: https://github.com/apache/hudi/pull/4686#issuecomment-1021317836 ## CI report: * 3264a340cfac2614a919d34d01eb469bcc9a52ae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5507) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4686: [HUDI-3232] auto-reload active timeline incrementally
hudi-bot removed a comment on pull request #4686: URL: https://github.com/apache/hudi/pull/4686#issuecomment-1021314712 ## CI report: * 3264a340cfac2614a919d34d01eb469bcc9a52ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] BenjMaq commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
BenjMaq commented on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-1021327116 I found the problem. The JAR in my Hive aux path was outdated. I replaced it with the latest version and all was good. Sorry for the inconvenience, and thank you so much for supporting and trying to find the error! Good to close on my end! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
nsivabalan closed issue #4154: URL: https://github.com/apache/hudi/issues/4154 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
nsivabalan commented on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-1021348122 thanks for updating! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] stayrascal commented on a change in pull request #4141: [HUDI-2815] Support partial update for streaming change logs
stayrascal commented on a change in pull request #4141: URL: https://github.com/apache/hudi/pull/4141#discussion_r791887562 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateWithLatestAvroPayload.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import org.apache.hudi.common.util.Option; + +import org.apache.avro.Schema; +import org.apache.avro.generic.GenericRecord; +import org.apache.avro.generic.IndexedRecord; + +import java.io.IOException; +import java.util.List; +import java.util.Objects; +import java.util.Properties; + +import static org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro; + +/** + * The only difference with {@link DefaultHoodieRecordPayload} is that support update partial fields + * in latest record which value is not null to existing record instead of all fields. + * + * Assuming a {@link GenericRecord} has three fields: a int , b int, c int. The first record value: 1, 2, 3. + * The second record value is: 4, 5, null, the field c value is null. After call the combineAndGetUpdateValue method, + * we will get final record value: 4, 5, 3, field c value will not be overwritten because its value is null in latest record. + */ +public class PartialUpdateWithLatestAvroPayload extends DefaultHoodieRecordPayload { + Review comment: Seems that the partial update cannot support the case that the partition path is changed. Let's assume - a record exists in base file (a=1, b=2,c=null,dt=2022-01-15), - and an incoming record (a=1, b=null, c=3, dt=2022-01-16). The `BucketAssignFunction` will generate a deleted record with partition path `2022-01-15` and the incoming record with partition path `2022-01-16`. The original record in base file will be removed while running `HoodieMergeHandle.write(GenericRecord oldRecord)`, but incoming record cannot find the relevant record under `2022-01-16`, so the result will only keep the info from incoming record(a=1, b=null, c=3, dt=2022-01-16), the original(b=2) will missed. Any thoughts here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] vinothchandar commented on pull request #4503: [HUDI-2438] [RFC-34] Added the implementation details for the BigQuery integration
vinothchandar commented on pull request #4503: URL: https://github.com/apache/hudi/pull/4503#issuecomment-1020614643 @vingov What are the current blockers for making this work end-end? just the `.hoodie_partition_metadata` filtering ? @prashantwason shared an interesting idea to make that file an empty parquet file, and the contents of the current file put into the footers. We can provide an upgrade utility to do this as well in 0.11 and we should be good here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4600: [SUPPORT]When hive queries Hudi data, the query path is wrong
nsivabalan commented on issue #4600: URL: https://github.com/apache/hudi/issues/4600#issuecomment-1020690181 CC @yihua kafka connect related issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4686: [HUDI-3232] auto-reload active timeline incrementally
hudi-bot removed a comment on pull request #4686: URL: https://github.com/apache/hudi/pull/4686#issuecomment-1021314712 ## CI report: * 3264a340cfac2614a919d34d01eb469bcc9a52ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4449: [HUDI-2763] Metadata table records - support for key deduplication based on hardcoded key field
hudi-bot commented on pull request #4449: URL: https://github.com/apache/hudi/pull/4449#issuecomment-1020395034 ## CI report: * 5b739e69a278d80431c89e0e4a3ef84e9a5d8842 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=5475) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #4668: [SUPPORT] SaveMode.Append fails on renamed hudi tables
nsivabalan commented on issue #4668: URL: https://github.com/apache/hudi/issues/4668#issuecomment-1020590511 @YannByron : Can you please follow up on this. Or loop in someone who can assist here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2323) Upsert of Case Class with single field causes SchemaParseException
[ https://issues.apache.org/jira/browse/HUDI-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2323: -- Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18) > Upsert of Case Class with single field causes SchemaParseException > -- > > Key: HUDI-2323 > URL: https://issues.apache.org/jira/browse/HUDI-2323 > Project: Apache Hudi > Issue Type: Bug > Components: spark, writer-core >Affects Versions: 0.8.0 >Reporter: Tyler Jackson >Assignee: Raymond Xu >Priority: Critical > Labels: hudi-on-call, schema, sev:critical > Fix For: 0.11.0 > > Attachments: HudiSchemaGenerationTest.scala > > Original Estimate: 1h > Remaining Estimate: 1h > > Additional background information: > Spark version 3.1.1 > Scala version 2.12 > Hudi version 0.8.0 (hudi-spark-bundle_2.12 artifact) > > While testing a spark job in EMR of inserting and then upserting data for a > fairly complex nested case class structure, I ran into an issue that I was > having a hard time tracking down. It seems when part of the case class in the > dataframe to be written has a single field in it, the avro schema generation > fails with the following stacktrace, but only on the upsert: > {{21/08/19 15:08:34 ERROR BoundedInMemoryExecutor: error producing records}} > {{org.apache.avro.SchemaParseException: Can't redefine: array}} > \{{ at org.apache.avro.Schema$Names.put(Schema.java:1128) }} > \{{ at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562) }} > \{{ at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:690) }} > \{{ at org.apache.avro.Schema$ArraySchema.toJson(Schema.java:805) }} > \{{ at org.apache.avro.Schema$UnionSchema.toJson(Schema.java:882) }} > \{{ at org.apache.avro.Schema$RecordSchema.fieldsToJson(Schema.java:716) }} > \{{ at org.apache.avro.Schema$RecordSchema.toJson(Schema.java:701)}} > \{{ at org.apache.avro.Schema.toString(Schema.java:324)}} > \{{ at > org.apache.avro.SchemaCompatibility.checkReaderWriterCompatibility(SchemaCompatibility.java:68)}} > \{{ at > org.apache.parquet.avro.AvroRecordConverter.isElementType(AvroRecordConverter.java:866)}} > \{{ at > org.apache.parquet.avro.AvroRecordConverter$AvroCollectionConverter.(AvroRecordConverter.java:475)}} > \{{ at > org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:289)}} > \{{ at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:141)}} > \{{ at > org.apache.parquet.avro.AvroRecordConverter.newConverter(AvroRecordConverter.java:279)}} > \{{ at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:141)}} > \{{ at > org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)}} > \{{ at > org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)}} > \{{ at > org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)}} > \{{ at > org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)}} > \{{ at > org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)}} > \{{ at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)}} > \{{ at > org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)}} > \{{ at > org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)}} > \{{ at > org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)}} > \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}} > \{{ at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)}} > \{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266) }} > \{{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > }} > \{{ at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > }} > \{{ at java.lang.Thread.run(Thread.java:748) }} > > I am able to replicate the problem in my local IntelliJ setup using the test > that has been attached to this issue. The problem can be observed in the > DummyStepParent case class. Simply adding an additional field to the case > class eliminates the problem altogether (which is an acceptable workaround > for our purposes, but shouldn't ultimately be necessary). > {{case class DummyObject (}} > {{ fieldOne: String,}} > {{ listTwo: Seq[String],}} > {{ listThree: Seq[DummyChild],}} > {{ listFour: Seq[DummyStepChild],}} > {{ fieldFive: Boolean,}} > {{ listSix: Seq[
[jira] [Updated] (HUDI-2899) Fix DataFormatter usages removed in Spark 3.2
[ https://issues.apache.org/jira/browse/HUDI-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2899: -- Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18) > Fix DataFormatter usages removed in Spark 3.2 > - > > Key: HUDI-2899 > URL: https://issues.apache.org/jira/browse/HUDI-2899 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: Alexey Kudinkin >Assignee: Yann Byron >Priority: Major > Fix For: 0.11.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > > Trying to read that is partitioned on a string field ("product_category") > rather than by date, leads to `NoSuchMethodError` > {code:java} > scala> val readDf: DataFrame = > | spark.read.option(DataSourceReadOptions.ENABLE_DATA_SKIPPING.key(), > "false").format("hudi").load(outputPath) > java.lang.NoSuchMethodError: > org.apache.spark.sql.catalyst.util.DateFormatter$.apply(Ljava/time/ZoneId;)Lorg/apache/spark/sql/catalyst/util/DateFormatter; > at > org.apache.spark.sql.execution.datasources.Spark3ParsePartitionUtil.parsePartition(Spark3ParsePartitionUtil.scala:32) > at > org.apache.hudi.HoodieFileIndex.$anonfun$getAllQueryPartitionPaths$3(HoodieFileIndex.scala:559) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.hudi.HoodieFileIndex.getAllQueryPartitionPaths(HoodieFileIndex.scala:511) > at > org.apache.hudi.HoodieFileIndex.loadPartitionPathFiles(HoodieFileIndex.scala:575) > at org.apache.hudi.HoodieFileIndex.refresh0(HoodieFileIndex.scala:360) > at org.apache.hudi.HoodieFileIndex.(HoodieFileIndex.scala:157) > at > org.apache.hudi.DefaultSource.getBaseFileOnlyView(DefaultSource.scala:199) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:119) > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:69) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) > at > org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245) > at scala.Option.getOrElse(Option.scala:189) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188) > ... 68 elided {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2941) Show _hoodie_operation in spark sql results
[ https://issues.apache.org/jira/browse/HUDI-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2941: -- Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18) > Show _hoodie_operation in spark sql results > --- > > Key: HUDI-2941 > URL: https://issues.apache.org/jira/browse/HUDI-2941 > Project: Apache Hudi > Issue Type: Task > Components: spark-sql >Reporter: Raymond Xu >Assignee: Forward Xu >Priority: Critical > Labels: hudi-on-call, pull-request-available, sev:critical, > user-support-issues > Fix For: 0.11.0 > > Original Estimate: 1h > Time Spent: 0.5h > Remaining Estimate: 0.5h > > Details in > [https://github.com/apache/hudi/issues/4160] > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2917) Rollback may be incorrect for canIndexLogFile index
[ https://issues.apache.org/jira/browse/HUDI-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2917: -- Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18) > Rollback may be incorrect for canIndexLogFile index > --- > > Key: HUDI-2917 > URL: https://issues.apache.org/jira/browse/HUDI-2917 > Project: Apache Hudi > Issue Type: Bug > Components: Common Core >Reporter: ZiyueGuan >Assignee: ZiyueGuan >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > Problem: > we may find some data which should be rollbacked in hudi table. > Root cause: > Let's first recall how rollback plan generated about log blocks for > deltaCommit. Hudi takes two cases into consideration. > # For some log file with no base file, they are comprised by records which > are all 'insert record'. Delete them directly. Here we assume all inserted > record should be covered by this way. > # For those fileID which are updated according to inflight commit meta of > instant we want to rollback, we append command block to these log file to > rollback. Here all updated record are handled. > However, the first condition is not always true. For indexes which can index > log file, they could insert record to some existing log file. In current > process, inflight hoodieCommitMeta was generated before they are assigned to > specific filegroup. > > Fix: > What's needed to fix this problem, we need to use the result of partitioner > to generate hoodieCommitMeta rather than workProfile. Also, we may need more > comments in rollback code to remind this case. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-2987) event time not recorded in commit metadata when insert or bulk insert
[ https://issues.apache.org/jira/browse/HUDI-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2987: -- Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18) > event time not recorded in commit metadata when insert or bulk insert > - > > Key: HUDI-2987 > URL: https://issues.apache.org/jira/browse/HUDI-2987 > Project: Apache Hudi > Issue Type: Bug > Components: writer-core >Reporter: Raymond Xu >Assignee: sivabalan narayanan >Priority: Critical > Labels: hudi-on-call, pull-request-available, sev:high > Fix For: 0.11.0 > > Original Estimate: 4h > Remaining Estimate: 4h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3215) Solve UT in Spark3.2
[ https://issues.apache.org/jira/browse/HUDI-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3215: -- Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18) > Solve UT in Spark3.2 > > > Key: HUDI-3215 > URL: https://issues.apache.org/jira/browse/HUDI-3215 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Critical > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: ut.spark31, ut.spark32 > > Original Estimate: 1h > Time Spent: 0.5h > Remaining Estimate: 0.5h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3267) On-call team to triage GH issues, PRs, and JIRAs
[ https://issues.apache.org/jira/browse/HUDI-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3267: -- Sprint: Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/18) > On-call team to triage GH issues, PRs, and JIRAs > > > Key: HUDI-3267 > URL: https://issues.apache.org/jira/browse/HUDI-3267 > Project: Apache Hudi > Issue Type: Task > Components: dev-experience >Reporter: Raymond Xu >Priority: Major > Original Estimate: 8h > Time Spent: 8h 10m > Remaining Estimate: 0h > > h4. triaged GH issues > # > h4. triaged cirtical PR > # https://github.com/apache/hudi/pull/3745 > # https://github.com/apache/hudi/pull/4644 > # https://github.com/apache/hudi/pull/4649 > # https://github.com/apache/hudi/pull/4630 > # -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-1847) Add ability to decouple configs for scheduling inline and running async
[ https://issues.apache.org/jira/browse/HUDI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1847: -- Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18) > Add ability to decouple configs for scheduling inline and running async > --- > > Key: HUDI-1847 > URL: https://issues.apache.org/jira/browse/HUDI-1847 > Project: Apache Hudi > Issue Type: Improvement > Components: compaction >Reporter: Nishith Agarwal >Assignee: sivabalan narayanan >Priority: Major > Labels: core-flow-ds, pull-request-available, sev:critical > Fix For: 0.11.0 > > Original Estimate: 3h > Time Spent: 3h > Remaining Estimate: 2h > > Currently, there are 2 ways to enable compaction: > > # Inline - This will schedule compaction inline and execute inline > # Async - This option is only available for HoodieDeltaStreamer based jobs. > This turns on scheduling inline and running async as part of the same spark > job. > > Users need a config to be able to schedule only inline while having an > ability to execute in their own spark job -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3204) spark on TimestampBasedKeyGenerator has no result when query by partition column
[ https://issues.apache.org/jira/browse/HUDI-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3204: -- Sprint: Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/18) > spark on TimestampBasedKeyGenerator has no result when query by partition > column > > > Key: HUDI-3204 > URL: https://issues.apache.org/jira/browse/HUDI-3204 > Project: Apache Hudi > Issue Type: Bug > Components: spark >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Critical > Labels: hudi-on-call, sev:critical > Fix For: 0.11.0 > > Original Estimate: 3h > Remaining Estimate: 3h > > > {code:java} > import org.apache.hudi.DataSourceWriteOptions > import org.apache.hudi.config.HoodieWriteConfig > import org.apache.hudi.keygen.constant.KeyGeneratorOptions._ > import org.apache.hudi.hive.MultiPartKeysValueExtractor > val df = Seq((1, "z3", 30, "v1", "2018-09-23"), (2, "z3", 35, "v1", > "2018-09-24")).toDF("id", "name", "age", "ts", "data_date") > // mor > df.write.format("hudi"). > option(HoodieWriteConfig.TABLE_NAME, "issue_4417_mor"). > option("hoodie.datasource.write.table.type", "MERGE_ON_READ"). > option("hoodie.datasource.write.recordkey.field", "id"). > option("hoodie.datasource.write.partitionpath.field", "data_date"). > option("hoodie.datasource.write.precombine.field", "ts"). > option("hoodie.datasource.write.keygenerator.class", > "org.apache.hudi.keygen.TimestampBasedKeyGenerator"). > option("hoodie.deltastreamer.keygen.timebased.timestamp.type", "DATE_STRING"). > option("hoodie.deltastreamer.keygen.timebased.output.dateformat", > "/MM/dd"). > option("hoodie.deltastreamer.keygen.timebased.timezone", "GMT+8:00"). > option("hoodie.deltastreamer.keygen.timebased.input.dateformat", > "-MM-dd"). > mode(org.apache.spark.sql.SaveMode.Append). > save("file:///tmp/hudi/issue_4417_mor") > +---++--+--++---++---+---+--+ > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name| id|name|age| ts| data_date| > +---++--+--++---++---+---+--+ > | 20220110172709324|20220110172709324...| 2| > 2018/09/24|703e56d3-badb-40b...| 2| z3| 35| v1|2018-09-24| > | 20220110172709324|20220110172709324...| 1| > 2018/09/23|58fde2b3-db0e-464...| 1| z3| 30| v1|2018-09-23| > +---++--+--++---++---+---+--+ > // can not query any data > spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_mor").where("data_date > = '2018-09-24'") > // still can not query any data > spark.read.format("hudi").load("file:///tmp/hudi/issue_4417_mor").where("data_date > = '2018/09/24'").show > // cow > df.write.format("hudi"). > option(HoodieWriteConfig.TABLE_NAME, "issue_4417_cow"). > option("hoodie.datasource.write.table.type", "COPY_ON_WRITE"). > option("hoodie.datasource.write.recordkey.field", "id"). > option("hoodie.datasource.write.partitionpath.field", "data_date"). > option("hoodie.datasource.write.precombine.field", "ts"). > option("hoodie.datasource.write.keygenerator.class", > "org.apache.hudi.keygen.TimestampBasedKeyGenerator"). > option("hoodie.deltastreamer.keygen.timebased.timestamp.type", "DATE_STRING"). > option("hoodie.deltastreamer.keygen.timebased.output.dateformat", > "/MM/dd"). > option("hoodie.deltastreamer.keygen.timebased.timezone", "GMT+8:00"). > option("hoodie.deltastreamer.keygen.timebased.input.dateformat", > "-MM-dd"). > mode(org.apache.spark.sql.SaveMode.Append). > save("file:///tmp/hudi/issue_4417_cow") > +---++--+--++---++---+---+--+ > > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name| id|name|age| ts| data_date| > +---++--+--++---++---+---+--+ > | 20220110172721896|20220110172721896...| 2| > 2018/09/24|81cc7819-a0d1-4e6...| 2| z3| 35| v1|2018/09/24| | > 20220110172721896|20220110172721896...| 1| > 2018/09/23|d428019b-a829-41a...| 1| z3| 30| v1|2018/09/23| > +---++--+--++---++---+---+--+ > > // can not query any data >
[GitHub] [hudi] yanghua commented on pull request #4669: [HUDI-3239] Convert `BaseHoodieTableFileIndex` to Java
yanghua commented on pull request #4669: URL: https://github.com/apache/hudi/pull/4669#issuecomment-1020696601 @alexeykudinkin Thanks for your contribution! Can you figure out why CI failed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2151) Make performant out-of-box configs
[ https://issues.apache.org/jira/browse/HUDI-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2151: -- Sprint: Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/18) > Make performant out-of-box configs > -- > > Key: HUDI-2151 > URL: https://issues.apache.org/jira/browse/HUDI-2151 > Project: Apache Hudi > Issue Type: Task > Components: Code Cleanup, docs, writer-core >Reporter: Vinoth Chandar >Assignee: sivabalan narayanan >Priority: Critical > Labels: pull-request-available > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > We have quite a few configs which deliver better performance or usability, > but guarded by flags. > This is to identify them, change them, test (functionally, perf) and make > them default > > Need to ensure we also capture all the backwards compatibility issues that > can arise -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3135) Fix Show Partitions Command's Result after drop partition
[ https://issues.apache.org/jira/browse/HUDI-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3135: -- Sprint: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/10, Cont' improve - 2021/01/18) > Fix Show Partitions Command's Result after drop partition > - > > Key: HUDI-3135 > URL: https://issues.apache.org/jira/browse/HUDI-3135 > Project: Apache Hudi > Issue Type: Bug > Components: spark, spark-sql >Reporter: Forward Xu >Assignee: Forward Xu >Priority: Major > Labels: pull-request-available, user-support-issues > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > # add two partitions dt='2021-10-01', dt='2021-10-02' > # drop one partition dt='2021-10-01' > # show partitions ,The query result: dt='2021-10-01', dt='2021-10-02' The > expected result is: dt='2021-10-02' -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3240) ALTER TABLE rename breaks with managed table in Spark 2.4
[ https://issues.apache.org/jira/browse/HUDI-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3240: -- Sprint: Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/18) > ALTER TABLE rename breaks with managed table in Spark 2.4 > - > > Key: HUDI-3240 > URL: https://issues.apache.org/jira/browse/HUDI-3240 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Affects Versions: 0.10.1 >Reporter: Raymond Xu >Assignee: Yann Byron >Priority: Major > Fix For: 0.11.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > {code:sql} > create table if not exists cow_nonpt_nonpcf_tbl ( > id int, > name string, > price double > ) using hudi > options ( > type = 'cow', > primaryKey = 'id' > ); > insert into cow_nonpt_nonpcf_tbl select 1, 'a1', 20; > ALTER TABLE cow_nonpt_nonpcf_tbl RENAME TO cow_nonpt_nonpcf_tbl_2; > desc cow_nonpt_nonpcf_tbl_2; > -- desc works fine > select * from cow_nonpt_nonpcf_tbl_2; > -- throws exception{code} > {code:java} > 22/01/13 03:48:18 ERROR SparkSQLDriver: Failed in [select * from > cow_nonpt_nonpcf_tbl_2] > java.util.concurrent.ExecutionException: java.io.FileNotFoundException: File > file:/user/hive/warehouse/cow_nonpt_nonpcf_tbl_2 does not exist > at > org.spark_project.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) > at > org.spark_project.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) > at > org.spark_project.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > org.spark_project.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at > org.spark_project.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at > org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at > org.spark_project.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:141) > at > org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:227) > at > org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:264) > at > org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:255) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:107) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsDown(AnalysisHelper.scala:106) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$apply$6.apply(AnalysisHelper.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$apply$6.apply(AnalysisHelper.scala:113) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$res
[jira] [Updated] (HUDI-3253) preferred to use table's location
[ https://issues.apache.org/jira/browse/HUDI-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3253: -- Sprint: Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/18) > preferred to use table's location > - > > Key: HUDI-3253 > URL: https://issues.apache.org/jira/browse/HUDI-3253 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Major > Labels: pull-request-available > Fix For: 0.11.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > When we create a hudi table with specified location which isn't the subpath > of the current database's location, and then turn this table to a managed > table, it'll fail to find the right table path. > The steps you can run to reproduce: > > {code:java} > // create table in SPARK > create table if not exists cow_nonpt_nonpcf_tbl ( > id int, > name string, > price double > ) using hudi > options ( > type = 'cow', > primaryKey = 'id' > ) > location '/user/hudi/cow_nonpt_nonpcf_tbl'; > // turn it to a managed table in HIVE > alter table cow_nonpt_nonpcf_tbl set tblproperties ('EXTERNAL'='false'); > // insert some data in SPARK > insert into cow_nonpt_nonpcf_tbl select 1, 'a1', 20; > // will throw FileNotFoundException{code} > > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HUDI-3213) compaction should not change the commit time
[ https://issues.apache.org/jira/browse/HUDI-3213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3213: -- Sprint: Cont' improve - 2021/01/18, Cont' improve - 2021/01/24 (was: Cont' improve - 2021/01/18) > compaction should not change the commit time > > > Key: HUDI-3213 > URL: https://issues.apache.org/jira/browse/HUDI-3213 > Project: Apache Hudi > Issue Type: Bug > Components: spark, writer-core >Reporter: Yann Byron >Assignee: Yann Byron >Priority: Critical > Labels: hudi-on-call, pull-request-available, sev:critical > Fix For: 0.11.0 > > Original Estimate: 2h > Remaining Estimate: 2h > > when finish the sixth operation where two records inserted and `compaction` > in `TestMORDataSource.testCount`, `hudiIncDF6.count()` returns 152. Because > there are 150 records which just have finished the `compaction` and consist > of 100 records updated in the second and third times and 50 records updated > in the fifth updated, and 2 records inserted in the six time. > The right answer should be 2, and 150 records should not be counted in. > The reason is that `compaction` has changed the commit time of some records > which are updated later and stored in log file. > {code:java} > val hudiIncDF6 = spark.read.format("org.apache.hudi") > .option(DataSourceReadOptions.QUERY_TYPE.key, > DataSourceReadOptions.QUERY_TYPE_INCREMENTAL_OPT_VAL) > .option(DataSourceReadOptions.BEGIN_INSTANTTIME.key, commit5Time) > .option(DataSourceReadOptions.END_INSTANTTIME.key, commit6Time) > .load(basePath) > // compaction updated 150 rows + inserted 2 new row > assertEquals(152, hudiIncDF6.count()) {code} > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[GitHub] [hudi] h7kanna commented on pull request #4650: [MINOR] Added log to debug checkpoint resumption when set to 0
h7kanna commented on pull request #4650: URL: https://github.com/apache/hudi/pull/4650#issuecomment-1020679595 Yes, fixed it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4676: Hudi 3304 support partial update on mor table
hudi-bot commented on pull request #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1020935987 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4667: [HUDI-3276][Stacked on 4559] Rebased Parquet-based `FileInputFormat` impls to inherit from `MapredParquetInputFormat`
hudi-bot commented on pull request #4667: URL: https://github.com/apache/hudi/pull/4667#issuecomment-1020638631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4669: [HUDI-3239] Convert `BaseHoodieTableFileIndex` to Java
hudi-bot commented on pull request #4669: URL: https://github.com/apache/hudi/pull/4669#issuecomment-1020641005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan closed issue #4170: [SUPPORT] Understanding Clustering Behavior
nsivabalan closed issue #4170: URL: https://github.com/apache/hudi/issues/4170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4677: [HUDI-3237] gracefully fail to change column datatype
hudi-bot commented on pull request #4677: URL: https://github.com/apache/hudi/pull/4677#issuecomment-1020687079 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4556: [HUDI-3191] Removing duplicating file-listing process w/in Hive's MOR `FileInputFormat`s
alexeykudinkin commented on a change in pull request #4556: URL: https://github.com/apache/hudi/pull/4556#discussion_r791350112 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/TestHoodieMergeOnReadTable.java ## @@ -189,8 +190,10 @@ public void testUpsertPartitioner(boolean populateMetaFields) throws Exception { assertTrue(fileIdToNewSize.entrySet().stream().anyMatch(entry -> fileIdToSize.get(entry.getKey()) < entry.getValue())); - List dataFiles = roView.getLatestBaseFiles().map(HoodieBaseFile::getPath).collect(Collectors.toList()); - List recordsRead = HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(hadoopConf(), dataFiles, + List inputPaths = roView.getLatestBaseFiles() + .map(baseFile -> new Path(baseFile.getPath()).getParent().toString()) + .collect(Collectors.toList()); Review comment: Yes, these are correct -- previously they were actually working correctly just b/c we did the double file-listing (w/in `getRealtimeSplits`). I have to pass _partition paths_, not base-file paths. ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java ## @@ -166,10 +169,12 @@ void testRollbackWithDeltaAndCompactionCommit(boolean rollbackUsingMarkers) thro JavaRDD writeRecords = jsc().parallelize(records, 1); JavaRDD writeStatusJavaRDD = client.upsert(writeRecords, newCommitTime); - client.commit(newCommitTime, writeStatusJavaRDD); + List statuses = writeStatusJavaRDD.collect(); assertNoWriteErrors(statuses); + client.commit(newCommitTime, jsc().parallelize(statuses)); + Review comment: These tests are actually written incorrectly -- they're dereferencing RDDs twice w/in `commit` and when the collect w/in the state itself. This leads to same base-files being double-written, which in turn fails assertion that i currently put in place to make sure that legacy flow and the new one yield identical results. ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java ## @@ -201,8 +206,10 @@ void testRollbackWithDeltaAndCompactionCommit(boolean rollbackUsingMarkers) thro copyOfRecords = dataGen.generateUpdates(commitTime1, copyOfRecords); copyOfRecords.addAll(dataGen.generateInserts(commitTime1, 200)); -List dataFiles = tableView.getLatestBaseFiles().map(HoodieBaseFile::getPath).collect(Collectors.toList()); -List recordsRead = HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(hadoopConf(), dataFiles, +List inputPaths = tableView.getLatestBaseFiles() +.map(baseFile -> new Path(baseFile.getPath()).getParent().toString()) +.collect(Collectors.toList()); Review comment: Preserving existing behavior ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java ## @@ -201,8 +206,10 @@ void testRollbackWithDeltaAndCompactionCommit(boolean rollbackUsingMarkers) thro copyOfRecords = dataGen.generateUpdates(commitTime1, copyOfRecords); copyOfRecords.addAll(dataGen.generateInserts(commitTime1, 200)); -List dataFiles = tableView.getLatestBaseFiles().map(HoodieBaseFile::getPath).collect(Collectors.toList()); -List recordsRead = HoodieMergeOnReadTestUtils.getRecordsUsingInputFormat(hadoopConf(), dataFiles, +List inputPaths = tableView.getLatestBaseFiles() +.map(baseFile -> new Path(baseFile.getPath()).getParent().toString()) +.collect(Collectors.toList()); Review comment: Preserving existing behavior (hence keeping lists instead of Sets) ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java ## @@ -268,11 +281,13 @@ void testRollbackWithDeltaAndCompactionCommit(boolean rollbackUsingMarkers) thro thirdClient.startCommitWithTime(newCommitTime); writeStatusJavaRDD = thirdClient.upsert(writeRecords, newCommitTime); + statuses = writeStatusJavaRDD.collect(); -thirdClient.commit(newCommitTime, writeStatusJavaRDD); // Verify there are no errors assertNoWriteErrors(statuses); +thirdClient.commit(newCommitTime, jsc().parallelize(statuses)); + Review comment: Replied above ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java ## @@ -361,15 +386,19 @@ void testMultiRollbackWithDeltaAndCompactionCommit() throws Exception { copyOfRecords = dataGen.generateUpdates(newCommitTime, copyOfRecords); copyOfRecords.addAll(dataGen.generateInserts(newCommitT
[GitHub] [hudi] BenjMaq edited a comment on issue #4154: [SUPPORT] INSERT OVERWRITE operation does not work when using Spark SQL
BenjMaq edited a comment on issue #4154: URL: https://github.com/apache/hudi/issues/4154#issuecomment-1021303523 Hi again, Still trying to understand the issue :) Here's the create table statement I run to create the table using `spark-sql`: ``` CREATE TABLE IF NOT EXISTS public.test_partitioned ( id bigint, name string, start_date string, dt string ) USING hudi LOCATION 's3a:///hudi_data_lake/test_partitioned' OPTIONS ( type = 'cow', primaryKey = 'id', preCombineField = 'start_date' ) PARTITIONED BY (dt); ``` and here's the DDL when I run `show create table public.test_partitioned` in Hive cli: ``` CREATE EXTERNAL TABLE `public.test_partitioned`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `id` bigint, `name` string, `start_date` string) PARTITIONED BY ( `dt` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ( 'path'='s3a:///hudi_data_lake/test_partitioned', 'preCombineField'='start_date', 'primaryKey'='id', 'type'='cow') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a:///hudi_data_lake/test_partitioned' TBLPROPERTIES ( 'last_commit_time_sync'='20220125071309', 'spark.sql.create.version'='2.4.4-20210211', 'spark.sql.sources.provider'='hudi', 'spark.sql.sources.schema.numPartCols'='1', 'spark.sql.sources.schema.numParts'='1', 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"id","type":"long","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"start_date","type":"string","nullable":true,"metadata":{}},{"name":"dt","type":"string","nullable":true,"metadata":{}}]}', 'spark.sql.sources.schema.partCol.0'='dt', 'transient_lastDdlTime'='1643122431') ``` Are you maybe seeing something that looks off? I see for example ``` STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' ``` which is neither `HiveInputFormat` nor `HoodieCombineHiveInputFormat` mentioned by @nsivabalan. Is that normal? As mentioned above, I've followed the steps in https://hudi.apache.org/docs/query_engine_setup#PrestoDB to be able to read from Presto and Hive. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot removed a comment on pull request #4650: [MINOR] Added log to debug checkpoint resumption when set to 0
hudi-bot removed a comment on pull request #4650: URL: https://github.com/apache/hudi/pull/4650#issuecomment-1017422670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #3870: [SUPPORT] Hudi v0.8.0 Savepoint rollback failure
nsivabalan commented on issue #3870: URL: https://github.com/apache/hudi/issues/3870#issuecomment-1020607736 Closing this due to inactivity. we have savepoint rollback feature added to both tables. Unless we have further info, we can't help triage. thanks for reporting. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on a change in pull request #3614: [HUDI-2370] Supports data encryption
liujinhui1994 commented on a change in pull request #3614: URL: https://github.com/apache/hudi/pull/3614#discussion_r791425017 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/HoodieFileWriterFactory.java ## @@ -67,6 +72,10 @@ String instantTime, Path path, HoodieWriteConfig config, Schema schema, HoodieTable hoodieTable, TaskContextSupplier taskContextSupplier, boolean populateMetaFields, boolean enableBloomFilter) throws IOException { Option filter = enableBloomFilter ? Option.of(createBloomFilter(config)) : Option.empty(); + +LOG.error("--123" + SystemUtils.getClassLocation(AvroSchemaConverter.class)); +LOG.error("--123" + SystemUtils.getClassLocation(Types.class)); + Review comment: Yes, this is only used to check for conflicts before, it will be deleted ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/storage/SystemUtils.java ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.io.storage; + +import java.io.File; +import java.net.MalformedURLException; +import java.net.URL; +import java.security.CodeSource; +import java.security.ProtectionDomain; + +/** + * system util. + */ +public class SystemUtils { + + public static URL getClassLocation(final Class cls) { Review comment: will be deleted, here is for troubleshooting ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/SystemUtils.java ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.integ; + +import java.io.File; +import java.net.MalformedURLException; +import java.net.URL; +import java.security.CodeSource; +import java.security.ProtectionDomain; + +/** + * system util. + */ +public class SystemUtils { Review comment: ok -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] alexeykudinkin commented on a change in pull request #3893: [HUDI-2656] Generalize HoodieIndex for flexible record data type
alexeykudinkin commented on a change in pull request #3893: URL: https://github.com/apache/hudi/pull/3893#discussion_r791267751 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecord.java ## @@ -18,21 +18,21 @@ package org.apache.hudi.common.model; -import java.util.Map; -import java.util.stream.Collectors; -import java.util.stream.IntStream; import org.apache.hudi.common.util.CollectionUtils; import org.apache.hudi.common.util.Option; +import org.apache.hudi.common.util.collection.Pair; import java.io.Serializable; import java.util.List; +import java.util.Map; import java.util.Objects; -import org.apache.hudi.common.util.collection.Pair; +import java.util.stream.Collectors; +import java.util.stream.IntStream; /** * A Single Record managed by Hoodie. */ -public class HoodieRecord implements Serializable { +public abstract class HoodieRecord implements Serializable { Review comment: @yihua @xushiyan let's chat more on this to make sure we're aligned on the approach going f/w: I was thinking of keeping this component file-format agnostic and instead make it engine-specific, while refactoring MOR table read-path for efficient querying. Can you elaborate what's the goal you're striving for w/ `HoodieAvroRecord`? P.S. Putting this context in here for somebody who might not be aware of previous conversations ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java ## @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.hudi.common.model; + +public class HoodieAvroRecord extends HoodieRecord { Review comment: @yihua sorry, not sure i understood your point. Can you elaborate? Why do we want to extend `HoodieAvroRecord` with format-specific impl? ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java ## @@ -87,14 +87,14 @@ * @return the tagged {@link HoodieRecord} */ public static HoodieRecord getTaggedRecord(HoodieRecord inputRecord, Option location) { -HoodieRecord record = inputRecord; +HoodieRecord record = inputRecord; Review comment: Thanks for fixing this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] BruceKellan closed issue #4648: [SUPPORT] Upgrade Hudi to 0.10.1-rc2 from 0.10.0 using spark
BruceKellan closed issue #4648: URL: https://github.com/apache/hudi/issues/4648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-2488) Support async metadata index creation while regular writers and table services are in progress
[ https://issues.apache.org/jira/browse/HUDI-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-2488: -- Issue Type: Epic (was: New Feature) > Support async metadata index creation while regular writers and table > services are in progress > -- > > Key: HUDI-2488 > URL: https://issues.apache.org/jira/browse/HUDI-2488 > Project: Apache Hudi > Issue Type: Epic >Reporter: sivabalan narayanan >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2021-11-17-11-04-09-713.png > > > For now, we have only FILES partition in metadata table. and our suggestion > is to stop all processes and then restart one by one by enabling metadata > table. first process to start back will invoke bootstrapping of the metadata > table. > > But this may not work out well as we add more and more partitions to metadata > table. > We need to support bootstrapping a single or more partitions in metadata > table while regular writers and table services are in progress. > > > Penning down my thoughts/idea. > I tried to find a way to get this done w/o adding an additional lock, but > could not crack that. So, here is one way to support async bootstrap. > > Introducing a file called "available_partitions" in some special file under > metadata table. This file will contain the list of partitions that are > available to apply updates from data table. i.e. when we do synchronous > updates from data table to metadata table, when we have N no of partitions in > metadata table, we need to know what partitions are fully bootstrapped and > ready to take updates. this file will assist in maintaining that info. We can > debate on how to maintain this info (tbl props, or separate file etc, but for > now let's say this file is the source of truth). Idea here is that, any async > bootstrap process will update this file with the new partition that got > bootstrapped once the bootstrap is fully complete. So that all other writers > will know what partitions to update. > Add we need to introduce a metadata_lock as well. > > here is how writers and async bootstrap will pan out. > > Regular writer or any async table service(compaction, etc): > when changes are required to be applied to metadata table: // fyi. as of > today this already happens within data table lock. > Take metadata_lock > read contents of available_partitions. > prep records and apply updates to metadata table. > release lock. > > Async bootstrap process: > Start bootstrapping of a given partition (eg files) in metadata table. > do it in a loop. i.e. first iteration of bootstrap could take 10 mins > for eg. and then again catch up new commits that happened in the last 10 mins > which could take 1 min for instance. and then again go for another loop. > Whenever total bootstrap time for a round is ~ 1min or less, in the next > round, we can go in for final iteration. > During the final iteration, take the metadata_lock. // this lock > should not be held for more than few secs. > apply any new commits that happened while last iteration > of bootstrap was happening. > update "available_partitions" file with this partition > info that got fully bootstrapped. > release lock. > > metadata_lock: will ensure when async bootstrap is in final stages of > bootstrapping, we should not miss any commits that is nearing completion. So, > we ought to take a lock to ensure we don't miss out on any commits. Either > async bootstrap will apply the update, or the actual writer itself will > update directly if bootstrap is fully complete. > > Rgdn "available_partitions": > I was looking for a way to know what partitions are fully ready to take in > direct updates from regular writers and hence chose this way. We can also > think about creating a temp_partition(files_temp or something) while > bootstrap in progress and then rename to original partition name once > bootstrap is fully complete. If we can ensure reliably renaming of these > partitions(i.e, once files partition is available, it is fully ready to take > in direct updates), we can take this route as well. > Here is how it might pan out w/ folder/partition renaming. > > Regular writer or any async table service(compaction, etc): > when changes are required to be applied to metadata table: // fyi. as of > today this already happens within data table lock. > Take metadata_lock > list partitions in metadata table. ignore temp partitions. >
[jira] [Updated] (HUDI-2488) Support async metadata index creation while regular writers and table services are in progress
[ https://issues.apache.org/jira/browse/HUDI-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-2488: -- Epic Link: (was: HUDI-1292) > Support async metadata index creation while regular writers and table > services are in progress > -- > > Key: HUDI-2488 > URL: https://issues.apache.org/jira/browse/HUDI-2488 > Project: Apache Hudi > Issue Type: Epic >Reporter: sivabalan narayanan >Assignee: Sagar Sumit >Priority: Blocker > Labels: pull-request-available > Fix For: 0.11.0 > > Attachments: image-2021-11-17-11-04-09-713.png > > > For now, we have only FILES partition in metadata table. and our suggestion > is to stop all processes and then restart one by one by enabling metadata > table. first process to start back will invoke bootstrapping of the metadata > table. > > But this may not work out well as we add more and more partitions to metadata > table. > We need to support bootstrapping a single or more partitions in metadata > table while regular writers and table services are in progress. > > > Penning down my thoughts/idea. > I tried to find a way to get this done w/o adding an additional lock, but > could not crack that. So, here is one way to support async bootstrap. > > Introducing a file called "available_partitions" in some special file under > metadata table. This file will contain the list of partitions that are > available to apply updates from data table. i.e. when we do synchronous > updates from data table to metadata table, when we have N no of partitions in > metadata table, we need to know what partitions are fully bootstrapped and > ready to take updates. this file will assist in maintaining that info. We can > debate on how to maintain this info (tbl props, or separate file etc, but for > now let's say this file is the source of truth). Idea here is that, any async > bootstrap process will update this file with the new partition that got > bootstrapped once the bootstrap is fully complete. So that all other writers > will know what partitions to update. > Add we need to introduce a metadata_lock as well. > > here is how writers and async bootstrap will pan out. > > Regular writer or any async table service(compaction, etc): > when changes are required to be applied to metadata table: // fyi. as of > today this already happens within data table lock. > Take metadata_lock > read contents of available_partitions. > prep records and apply updates to metadata table. > release lock. > > Async bootstrap process: > Start bootstrapping of a given partition (eg files) in metadata table. > do it in a loop. i.e. first iteration of bootstrap could take 10 mins > for eg. and then again catch up new commits that happened in the last 10 mins > which could take 1 min for instance. and then again go for another loop. > Whenever total bootstrap time for a round is ~ 1min or less, in the next > round, we can go in for final iteration. > During the final iteration, take the metadata_lock. // this lock > should not be held for more than few secs. > apply any new commits that happened while last iteration > of bootstrap was happening. > update "available_partitions" file with this partition > info that got fully bootstrapped. > release lock. > > metadata_lock: will ensure when async bootstrap is in final stages of > bootstrapping, we should not miss any commits that is nearing completion. So, > we ought to take a lock to ensure we don't miss out on any commits. Either > async bootstrap will apply the update, or the actual writer itself will > update directly if bootstrap is fully complete. > > Rgdn "available_partitions": > I was looking for a way to know what partitions are fully ready to take in > direct updates from regular writers and hence chose this way. We can also > think about creating a temp_partition(files_temp or something) while > bootstrap in progress and then rename to original partition name once > bootstrap is fully complete. If we can ensure reliably renaming of these > partitions(i.e, once files partition is available, it is fully ready to take > in direct updates), we can take this route as well. > Here is how it might pan out w/ folder/partition renaming. > > Regular writer or any async table service(compaction, etc): > when changes are required to be applied to metadata table: // fyi. as of > today this already happens within data table lock. > Take metadata_lock > list partitions in metadata table. ignore temp partitions. >
[GitHub] [hudi] stym06 commented on pull request #4665: [HUDI-2733] Add support for Thrift sync
stym06 commented on pull request #4665: URL: https://github.com/apache/hudi/pull/4665#issuecomment-1020492537 @nsivabalan can someone look if this looks good? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] hudi-bot commented on pull request #4685: [WIP][DO NOT MERGE][CI Test Only] Remove hbase-server dependency, pull in HFile related classes, and upgrade HBase to 2.4.9
hudi-bot commented on pull request #4685: URL: https://github.com/apache/hudi/pull/4685#issuecomment-1020911910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] liujinhui1994 commented on pull request #4483: [HUDI-2370] [TEST] Parquet Encryption
liujinhui1994 commented on pull request #4483: URL: https://github.com/apache/hudi/pull/4483#issuecomment-1020885886 > @liujinhui1994 can you help clarify the plan here please? why we need a separate PR for testing, as there is already #3614 ; we shouldn't separate test from the implementation into another PR The last PR is the main one, here is the test. I will close this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org