[GitHub] [hudi] guoke111 opened a new pull request #4482: 分支0.10.0

2021-12-31 Thread GitBox


guoke111 opened a new pull request #4482:
URL: https://github.com/apache/hudi/pull/4482


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4482: 分支0.10.0

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4482:
URL: https://github.com/apache/hudi/pull/4482#issuecomment-1003302439


   
   ## CI report:
   
   * ef9923fc5551851ec4ee71896a62c0615da45ee8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4482: 分支0.10.0

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4482:
URL: https://github.com/apache/hudi/pull/4482#issuecomment-1003302439


   
   ## CI report:
   
   * ef9923fc5551851ec4ee71896a62c0615da45ee8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4482: 分支0.10.0

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4482:
URL: https://github.com/apache/hudi/pull/4482#issuecomment-1003303166


   
   ## CI report:
   
   * ef9923fc5551851ec4ee71896a62c0615da45ee8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4829)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4480:
URL: https://github.com/apache/hudi/pull/4480#issuecomment-1003295706


   
   ## CI report:
   
   * c4a2ace5e28fafb29394a1448e1a6c2a0645dda9 UNKNOWN
   * 107569356716649b14ba96674ee9559989906f2b Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4825)
 
   * b8335b109c31a3defc5a93d0e08bcb77f5567192 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4827)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4480:
URL: https://github.com/apache/hudi/pull/4480#issuecomment-1003305890


   
   ## CI report:
   
   * c4a2ace5e28fafb29394a1448e1a6c2a0645dda9 UNKNOWN
   * b8335b109c31a3defc5a93d0e08bcb77f5567192 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4827)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4480:
URL: https://github.com/apache/hudi/pull/4480#issuecomment-1003309138


   
   ## CI report:
   
   * c4a2ace5e28fafb29394a1448e1a6c2a0645dda9 UNKNOWN
   * b8335b109c31a3defc5a93d0e08bcb77f5567192 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4827)
 
   * 874be971351618f3f7024eb7836428f6d10d2d7c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4480:
URL: https://github.com/apache/hudi/pull/4480#issuecomment-1003305890


   
   ## CI report:
   
   * c4a2ace5e28fafb29394a1448e1a6c2a0645dda9 UNKNOWN
   * b8335b109c31a3defc5a93d0e08bcb77f5567192 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4827)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guoke111 closed pull request #4482: 0.10.0

2021-12-31 Thread GitBox


guoke111 closed pull request #4482:
URL: https://github.com/apache/hudi/pull/4482


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 opened a new pull request #4483: [HUDI-2370] [TEST] Parquet Encryption

2021-12-31 Thread GitBox


liujinhui1994 opened a new pull request #4483:
URL: https://github.com/apache/hudi/pull/4483


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4483: [HUDI-2370] [TEST] Parquet Encryption

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4483:
URL: https://github.com/apache/hudi/pull/4483#issuecomment-1003318673


   
   ## CI report:
   
   * ba83b967abf427a81e378ad97b2801ec9e539ec1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4483: [HUDI-2370] [TEST] Parquet Encryption

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4483:
URL: https://github.com/apache/hudi/pull/4483#issuecomment-1003318673


   
   ## CI report:
   
   * ba83b967abf427a81e378ad97b2801ec9e539ec1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4483: [HUDI-2370] [TEST] Parquet Encryption

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4483:
URL: https://github.com/apache/hudi/pull/4483#issuecomment-1003319395


   
   ## CI report:
   
   * ba83b967abf427a81e378ad97b2801ec9e539ec1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4830)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4480:
URL: https://github.com/apache/hudi/pull/4480#issuecomment-1003323368


   
   ## CI report:
   
   * c4a2ace5e28fafb29394a1448e1a6c2a0645dda9 UNKNOWN
   * b8335b109c31a3defc5a93d0e08bcb77f5567192 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4827)
 
   * 874be971351618f3f7024eb7836428f6d10d2d7c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4831)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4480:
URL: https://github.com/apache/hudi/pull/4480#issuecomment-1003309138


   
   ## CI report:
   
   * c4a2ace5e28fafb29394a1448e1a6c2a0645dda9 UNKNOWN
   * b8335b109c31a3defc5a93d0e08bcb77f5567192 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4827)
 
   * 874be971351618f3f7024eb7836428f6d10d2d7c UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-3133) Add partition-level num groups control in PartitionAwareClusteringPlanStrategy

2021-12-31 Thread liujinhui (Jira)
liujinhui created HUDI-3133:
---

 Summary: Add partition-level num groups control in 
PartitionAwareClusteringPlanStrategy
 Key: HUDI-3133
 URL: https://issues.apache.org/jira/browse/HUDI-3133
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: liujinhui






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] dongkelun commented on pull request #4083: [HUDI-2837] The original hoodie.table.name should be maintained in Spark SQL

2021-12-31 Thread GitBox


dongkelun commented on pull request #4083:
URL: https://github.com/apache/hudi/pull/4083#issuecomment-1003327348


   > > @dongkelun @xushiyan I offer another solution to discuss.
   > > Query incrementally in hive need to set 
`hoodie.%s.consume.start.timestamp` which is used in 
`HoodieHiveUtils.readStartCommitTime`。Currently, we pass the 
`hoodie.table.name` named `tableName` to this function. We can add configs 
`hoodie.datasource.write.database.name` in `DataSourceWriteOptions` and 
`hoodie.database.name` in `HoodieTableConfig`. And if `database.name` provided, 
we joint the `database.name` and `table.name` and pass it to 
`readStartCommitTime`. And then, use can set 
`hoodie.dbName.tableName.consume.start.timestamp` in hive and query.
   > > Also, `hoodie.datasource.write.database.name` and `hoodie.database.name` 
can reuse in other scene.
   > > @xushiyan what do you think.
   > 
   > @xushiyan @YannByron I probably understand the solution.
   > 
   > SQL will persist the database name to ` hoodie.properties` by default, DF 
is selectively persisted through optional database parameters. Then, in 
incremental query, if set ` databaseName.tableName`, we match 
`databaseName.tableName`. If it is inconsistent or there is no databaseName, 
incremental query will not be performed. If consistent, perform an incremental 
query.If the incremental query does not have a database name set, does not 
match the database name, only the table name
   > 
   > So, which parameter should DF use to persist the database name?
   
   @xushiyan Hello, do you think this idea is OK? If so, I'll submit a version 
according to this idea first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4480:
URL: https://github.com/apache/hudi/pull/4480#issuecomment-1003323368


   
   ## CI report:
   
   * c4a2ace5e28fafb29394a1448e1a6c2a0645dda9 UNKNOWN
   * b8335b109c31a3defc5a93d0e08bcb77f5567192 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4827)
 
   * 874be971351618f3f7024eb7836428f6d10d2d7c Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4831)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4480: [HUDI-3123] consistent hashing index: basic write path (upsert/insert)

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4480:
URL: https://github.com/apache/hudi/pull/4480#issuecomment-1003336882


   
   ## CI report:
   
   * c4a2ace5e28fafb29394a1448e1a6c2a0645dda9 UNKNOWN
   * 874be971351618f3f7024eb7836428f6d10d2d7c Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4831)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4483: [HUDI-2370] [TEST] Parquet Encryption

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4483:
URL: https://github.com/apache/hudi/pull/4483#issuecomment-1003336891


   
   ## CI report:
   
   * ba83b967abf427a81e378ad97b2801ec9e539ec1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4830)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4483: [HUDI-2370] [TEST] Parquet Encryption

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4483:
URL: https://github.com/apache/hudi/pull/4483#issuecomment-1003319395


   
   ## CI report:
   
   * ba83b967abf427a81e378ad97b2801ec9e539ec1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4830)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope opened a new pull request #4484: [HUDI-3097][WIP] Allow hbase-shaded-server in trino bundle

2021-12-31 Thread GitBox


codope opened a new pull request #4484:
URL: https://github.com/apache/hudi/pull/4484


   ## What is the purpose of the pull request
   
   Necessary changes so that we can use hudi-trino-bundle in the new trino 
connector. Do not merge yet. When we upgrade to hbase-2.x then we need to 
replace hbase-shaded-server by hbase-server and relocate in pom to avoid other 
conflicts, e.g. guava. hbase-shade-server is no longer maintained after hbase 
1.7.1
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4484: [HUDI-3097][WIP] Allow hbase-shaded-server in trino bundle

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4484:
URL: https://github.com/apache/hudi/pull/4484#issuecomment-1003341292


   
   ## CI report:
   
   * f841e4cdbe10e893d7244789ff4e74ab5c73617b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4484: [HUDI-3097][WIP] Allow hbase-shaded-server in trino bundle

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4484:
URL: https://github.com/apache/hudi/pull/4484#issuecomment-1003341292


   
   ## CI report:
   
   * f841e4cdbe10e893d7244789ff4e74ab5c73617b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4484: [HUDI-3097][WIP] Allow hbase-shaded-server in trino bundle

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4484:
URL: https://github.com/apache/hudi/pull/4484#issuecomment-1003341916


   
   ## CI report:
   
   * f841e4cdbe10e893d7244789ff4e74ab5c73617b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4832)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4484: [HUDI-3097][WIP] Allow hbase-shaded-server in trino bundle

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4484:
URL: https://github.com/apache/hudi/pull/4484#issuecomment-1003341916


   
   ## CI report:
   
   * f841e4cdbe10e893d7244789ff4e74ab5c73617b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4832)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4484: [HUDI-3097][WIP] Allow hbase-shaded-server in trino bundle

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4484:
URL: https://github.com/apache/hudi/pull/4484#issuecomment-1003354178


   
   ## CI report:
   
   * f841e4cdbe10e893d7244789ff4e74ab5c73617b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4832)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan opened a new pull request #4485: [HUDI-2947] Fixing checkpoint fetch in detlastreamer

2021-12-31 Thread GitBox


nsivabalan opened a new pull request #4485:
URL: https://github.com/apache/hudi/pull/4485


   ## What is the purpose of the pull request
   
   Some operations in hudi may not add CHECKPOINT_KEY and CHECKPOINT_RESET_KEY 
like incline clustering, cleaning etc. So, deltastreamer should ignore those 
and go back to find the valid commit metadata from which hudi can fetch the 
checkpoint info. 
   Actual issue reported: Users starts deltastreamer in continuous mode w/ 
inline clustering and provides a checkpoint to start with. After few commits 
shutsdown the job. And then restarts. Lets say incidentally last commit was 
replace commit (clustering). So, when deltastreamer starts, it checks for 
CHECKPOINT_KEY and last commit metadata which will not be present in replace 
commit and hence starts resuming from user provided checkpoint which should not 
happen. 
   With the fix in this patch, we go back to find the right commit metadata 
which has the CHECKPOINT_KEY or CHECKPOINT_RESET key and uses that to determine 
whether to go with checkpoint from this commit or to go with user provided one. 
   
   
   ## Brief change log
   
   - Fixed the way Deltastreamer fetches checkpoint info from commits. Goes 
back to find the right commit metadata. 
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2947) HoodieDeltaStreamer/DeltaSync can improperly pick up the checkpoint config from CLI in continuous mode

2021-12-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2947:
-
Labels: pull-request-available sev:high  (was: sev:high)

> HoodieDeltaStreamer/DeltaSync can improperly pick up the checkpoint config 
> from CLI in continuous mode
> --
>
> Key: HUDI-2947
> URL: https://issues.apache.org/jira/browse/HUDI-2947
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Ethan Guo
>Assignee: sivabalan narayanan
>Priority: Critical
>  Labels: pull-request-available, sev:high
> Fix For: 0.11.0
>
>
> *Problem:*
> When deltastreamer is started with a given checkpoint, e.g., `--checkpoint 
> 0`, in the continuous mode, the deltastreamer job may pick up the wrong 
> checkpoint later on.  The wrong checkpoint (for 20211206203551080 commit) 
> happens after the replacecommit and clean, which is reset to "0", instead of 
> "5" after 20211206202728233.commit.  More details below.
>  
> The bug is due to the check here: 
> [https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java#L335]
> {code:java}
> if (cfg.checkpoint != null && 
> (StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))  
>   || 
> !cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY {
> resumeCheckpointStr = Option.of(cfg.checkpoint);
> } {code}
> In this case of resuming after a clustering commit, "cfg.checkpoint != null" 
> and 
> "StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))" 
>  are both true as "--checkpoint 0" is configured and last commit is 
> replacecommit without checkpoint keys.  This leads to the resume checkpoint 
> string being reset to the configured checkpoint, skipping the timeline 
> walk-back logic below, which is wrong.  
>  
> Timeline:
>  
> {code:java}
>  189069 Dec  6 12:19 20211206201238649.commit
>       0 Dec  6 12:12 20211206201238649.commit.requested
>       0 Dec  6 12:12 20211206201238649.inflight
>  189069 Dec  6 12:27 20211206201959151.commit
>       0 Dec  6 12:20 20211206201959151.commit.requested
>       0 Dec  6 12:20 20211206201959151.inflight
>  189069 Dec  6 12:34 20211206202728233.commit
>       0 Dec  6 12:27 20211206202728233.commit.requested
>       0 Dec  6 12:27 20211206202728233.inflight
>   36662 Dec  6 12:35 20211206203449899.replacecommit
>       0 Dec  6 12:35 20211206203449899.replacecommit.inflight
>   34656 Dec  6 12:35 20211206203449899.replacecommit.requested
>   28013 Dec  6 12:35 20211206203503574.clean
>   19024 Dec  6 12:35 20211206203503574.clean.inflight
>   19024 Dec  6 12:35 20211206203503574.clean.requested
>  189069 Dec  6 12:43 20211206203551080.commit
>       0 Dec  6 12:35 20211206203551080.commit.requested
>       0 Dec  6 12:35 20211206203551080.inflight
>  189069 Dec  6 12:50 20211206204311612.commit
>       0 Dec  6 12:43 20211206204311612.commit.requested
>       0 Dec  6 12:43 20211206204311612.inflight
>       0 Dec  6 12:50 20211206205044595.commit.requested
>       0 Dec  6 12:50 20211206205044595.inflight
>     128 Dec  6 12:56 archived
>     483 Dec  6 11:52 hoodie.properties
>  {code}
>  
> Checkpoints in commits:
>  
> {code:java}
> grep "deltastreamer.checkpoint.key" *
> 20211206201238649.commit:    "deltastreamer.checkpoint.key" : "2"
> 20211206201959151.commit:    "deltastreamer.checkpoint.key" : "3"
> 20211206202728233.commit:    "deltastreamer.checkpoint.key" : "4"
> 20211206203551080.commit:    "deltastreamer.checkpoint.key" : "1"
> 20211206204311612.commit:    "deltastreamer.checkpoint.key" : "2" {code}
>  
> *Steps to reproduce:*
> Run HoodieDeltaStreamer in the continuous mode, by providing both 
> "--checkpoint 0" and "--continuous", with inline clustering and sync clean 
> enabled (some configs are masked).
>  
> {code:java}
> spark-submit \
>   --master yarn \
>   --driver-memory 8g --executor-memory 8g --num-executors 3 --executor-cores 
> 4 \
>   --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
>   --conf 
> spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.DefaultAWSCredentialsProviderChain
>  \
>   --conf spark.speculation=true \
>   --conf spark.speculation.multiplier=1.0 \
>   --conf spark.speculation.quantile=0.5 \
>   --packages org.apache.spark:spark-avro_2.12:3.2.0 \
>   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
>   file:/home/hadoop/ethan/hudi-utilities-bundle_2.12-0.10.0-rc3.jar \
>   --props file:/home/hadoop/ethan/test.properties \
>   --source-class ... \
>   --source-ordering-field ts \
>   --target-base-path s3a://hudi-testing/test_hoodie_table_11/ \
>   --target-table test_table \
>   --tab

[GitHub] [hudi] hudi-bot commented on pull request #4485: [HUDI-2947] Fixing checkpoint fetch in detlastreamer

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4485:
URL: https://github.com/apache/hudi/pull/4485#issuecomment-1003388310


   
   ## CI report:
   
   * 4987bc78d52f97537a77e4dbdc7e0b1e294ee3ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4485: [HUDI-2947] Fixing checkpoint fetch in detlastreamer

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4485:
URL: https://github.com/apache/hudi/pull/4485#issuecomment-1003388923


   
   ## CI report:
   
   * 4987bc78d52f97537a77e4dbdc7e0b1e294ee3ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4833)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4485: [HUDI-2947] Fixing checkpoint fetch in detlastreamer

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4485:
URL: https://github.com/apache/hudi/pull/4485#issuecomment-1003388310


   
   ## CI report:
   
   * 4987bc78d52f97537a77e4dbdc7e0b1e294ee3ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4485: [HUDI-2947] Fixing checkpoint fetch in detlastreamer

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4485:
URL: https://github.com/apache/hudi/pull/4485#issuecomment-1003399247


   
   ## CI report:
   
   * 4987bc78d52f97537a77e4dbdc7e0b1e294ee3ec Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4833)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4485: [HUDI-2947] Fixing checkpoint fetch in detlastreamer

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4485:
URL: https://github.com/apache/hudi/pull/4485#issuecomment-1003388923


   
   ## CI report:
   
   * 4987bc78d52f97537a77e4dbdc7e0b1e294ee3ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4833)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] lsyldliu opened a new pull request #4486: [HUDI-3132] Minor fixes for HoodieCatalog

2021-12-31 Thread GitBox


lsyldliu opened a new pull request #4486:
URL: https://github.com/apache/hudi/pull/4486


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3132) Minor fixes for HoodieCatalog

2021-12-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3132:
-
Labels: pull-request-available  (was: )

> Minor fixes for HoodieCatalog
> -
>
> Key: HUDI-3132
> URL: https://issues.apache.org/jira/browse/HUDI-3132
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: dalongliu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0, 0.10.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4486: [HUDI-3132] Minor fixes for HoodieCatalog

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4486:
URL: https://github.com/apache/hudi/pull/4486#issuecomment-1003401574


   
   ## CI report:
   
   * 6d3607ea94783da1e4d0725b3702387dbe685c62 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4486: [HUDI-3132] Minor fixes for HoodieCatalog

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4486:
URL: https://github.com/apache/hudi/pull/4486#issuecomment-1003402145


   
   ## CI report:
   
   * 6d3607ea94783da1e4d0725b3702387dbe685c62 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4834)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4486: [HUDI-3132] Minor fixes for HoodieCatalog

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4486:
URL: https://github.com/apache/hudi/pull/4486#issuecomment-1003401574


   
   ## CI report:
   
   * 6d3607ea94783da1e4d0725b3702387dbe685c62 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4350: [HUDI-3047] Basic Implementation of Spark Datasource V2

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4350:
URL: https://github.com/apache/hudi/pull/4350#issuecomment-1003299314


   
   ## CI report:
   
   * 5f2bceb6f745b359ba7b5691ef1f2ab02eddde06 UNKNOWN
   * 3855884f4791a45fa3a973e1e540e6988e863223 UNKNOWN
   * 78e8080c9d530e1e54799afbef69edb67394bb29 UNKNOWN
   * daaabf8b5843585fa2cc4a4414ae287a8cd36dae UNKNOWN
   * 082742e8794ec236f63d45ba5780305045babefb UNKNOWN
   * f984f3a9e4f4b7cde1371c9f03e77e3fffd622ed UNKNOWN
   * 649579c1a0fb99e4e448f74cc6dd77ca13c661c3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4826)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4350: [HUDI-3047] Basic Implementation of Spark Datasource V2

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4350:
URL: https://github.com/apache/hudi/pull/4350#issuecomment-1003403150


   
   ## CI report:
   
   * 5f2bceb6f745b359ba7b5691ef1f2ab02eddde06 UNKNOWN
   * 3855884f4791a45fa3a973e1e540e6988e863223 UNKNOWN
   * 78e8080c9d530e1e54799afbef69edb67394bb29 UNKNOWN
   * daaabf8b5843585fa2cc4a4414ae287a8cd36dae UNKNOWN
   * 082742e8794ec236f63d45ba5780305045babefb UNKNOWN
   * f984f3a9e4f4b7cde1371c9f03e77e3fffd622ed UNKNOWN
   * 649579c1a0fb99e4e448f74cc6dd77ca13c661c3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4826)
 
   * a1a2979897b4eac3620b6d7a751ae2fc7cec95de UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1003405928


   
   ## CI report:
   
   * e12df0fe85613a48adfec2f57b0a6a27bd3919f4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3986)
 
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-985323557


   
   ## CI report:
   
   * e12df0fe85613a48adfec2f57b0a6a27bd3919f4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3986)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2021-12-31 Thread GitBox


codope commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1003405944


   > @codope : is this good to review. If there are any pending work, I can 
take it up as I am focusing on all issues and jiras. let me know.
   
   @nsivabalan It is now ready to review. Both row-writer and non-row writer 
path will emit same value for logical timestamp type column if the config is 
enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1003406520


   
   ## CI report:
   
   * e12df0fe85613a48adfec2f57b0a6a27bd3919f4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3986)
 
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1003405928


   
   ## CI report:
   
   * e12df0fe85613a48adfec2f57b0a6a27bd3919f4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3986)
 
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1003406987


   
   ## CI report:
   
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4203: [HUDI-2909] Handle logical type in TimestampBasedKeyGenerator

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4203:
URL: https://github.com/apache/hudi/pull/4203#issuecomment-1003406520


   
   ## CI report:
   
   * e12df0fe85613a48adfec2f57b0a6a27bd3919f4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3986)
 
   * 5b68cadeeec7a6482f9c5a9eeadad1ad816aa962 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4835)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4350: [HUDI-3047] Basic Implementation of Spark Datasource V2

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4350:
URL: https://github.com/apache/hudi/pull/4350#issuecomment-1003403150


   
   ## CI report:
   
   * 5f2bceb6f745b359ba7b5691ef1f2ab02eddde06 UNKNOWN
   * 3855884f4791a45fa3a973e1e540e6988e863223 UNKNOWN
   * 78e8080c9d530e1e54799afbef69edb67394bb29 UNKNOWN
   * daaabf8b5843585fa2cc4a4414ae287a8cd36dae UNKNOWN
   * 082742e8794ec236f63d45ba5780305045babefb UNKNOWN
   * f984f3a9e4f4b7cde1371c9f03e77e3fffd622ed UNKNOWN
   * 649579c1a0fb99e4e448f74cc6dd77ca13c661c3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4826)
 
   * a1a2979897b4eac3620b6d7a751ae2fc7cec95de UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4350: [HUDI-3047] Basic Implementation of Spark Datasource V2

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4350:
URL: https://github.com/apache/hudi/pull/4350#issuecomment-1003410143


   
   ## CI report:
   
   * 5f2bceb6f745b359ba7b5691ef1f2ab02eddde06 UNKNOWN
   * 3855884f4791a45fa3a973e1e540e6988e863223 UNKNOWN
   * 78e8080c9d530e1e54799afbef69edb67394bb29 UNKNOWN
   * daaabf8b5843585fa2cc4a4414ae287a8cd36dae UNKNOWN
   * 082742e8794ec236f63d45ba5780305045babefb UNKNOWN
   * f984f3a9e4f4b7cde1371c9f03e77e3fffd622ed UNKNOWN
   * 649579c1a0fb99e4e448f74cc6dd77ca13c661c3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4826)
 
   * a1a2979897b4eac3620b6d7a751ae2fc7cec95de Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4836)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4486: [HUDI-3132] Minor fixes for HoodieCatalog

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4486:
URL: https://github.com/apache/hudi/pull/4486#issuecomment-1003413179


   
   ## CI report:
   
   * 6d3607ea94783da1e4d0725b3702387dbe685c62 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4834)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4486: [HUDI-3132] Minor fixes for HoodieCatalog

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4486:
URL: https://github.com/apache/hudi/pull/4486#issuecomment-1003402145


   
   ## CI report:
   
   * 6d3607ea94783da1e4d0725b3702387dbe685c62 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4834)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4350: [HUDI-3047] Basic Implementation of Spark Datasource V2

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4350:
URL: https://github.com/apache/hudi/pull/4350#issuecomment-1003410143


   
   ## CI report:
   
   * 5f2bceb6f745b359ba7b5691ef1f2ab02eddde06 UNKNOWN
   * 3855884f4791a45fa3a973e1e540e6988e863223 UNKNOWN
   * 78e8080c9d530e1e54799afbef69edb67394bb29 UNKNOWN
   * daaabf8b5843585fa2cc4a4414ae287a8cd36dae UNKNOWN
   * 082742e8794ec236f63d45ba5780305045babefb UNKNOWN
   * f984f3a9e4f4b7cde1371c9f03e77e3fffd622ed UNKNOWN
   * 649579c1a0fb99e4e448f74cc6dd77ca13c661c3 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4826)
 
   * a1a2979897b4eac3620b6d7a751ae2fc7cec95de Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4836)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4350: [HUDI-3047] Basic Implementation of Spark Datasource V2

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4350:
URL: https://github.com/apache/hudi/pull/4350#issuecomment-1003419258


   
   ## CI report:
   
   * 5f2bceb6f745b359ba7b5691ef1f2ab02eddde06 UNKNOWN
   * 3855884f4791a45fa3a973e1e540e6988e863223 UNKNOWN
   * 78e8080c9d530e1e54799afbef69edb67394bb29 UNKNOWN
   * daaabf8b5843585fa2cc4a4414ae287a8cd36dae UNKNOWN
   * 082742e8794ec236f63d45ba5780305045babefb UNKNOWN
   * f984f3a9e4f4b7cde1371c9f03e77e3fffd622ed UNKNOWN
   * a1a2979897b4eac3620b6d7a751ae2fc7cec95de Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4836)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] kywe665 opened a new pull request #4487: [MINOR] - Updated articles doc page through 2021

2021-12-31 Thread GitBox


kywe665 opened a new pull request #4487:
URL: https://github.com/apache/hudi/pull/4487


   ## What is the purpose of the pull request
   
   Updated recent article references
   
   ## Brief change log
   
   Updated recent article references
   
   ## Verify this pull request
   
   docs change only
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [X] CI is green
   
- [X] Necessary doc changes done or have another open PR
  
- [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3047) Basic Implementation of Spark Datasource V2

2021-12-31 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3047:

Priority: Blocker  (was: Major)

> Basic Implementation of Spark Datasource V2
> ---
>
> Key: HUDI-3047
> URL: https://issues.apache.org/jira/browse/HUDI-3047
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write 
> path 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-3047) Basic Implementation of Spark Datasource V2

2021-12-31 Thread leesf (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leesf updated HUDI-3047:

Fix Version/s: 0.11.0

> Basic Implementation of Spark Datasource V2
> ---
>
> Key: HUDI-3047
> URL: https://issues.apache.org/jira/browse/HUDI-3047
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Introduce HoodieCatalog and HoodieInternalTableV2 to implement read and write 
> path 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua commented on pull request #4341: [HUDI-3040] Fix HoodieSparkBootstrapExample error info for usage

2021-12-31 Thread GitBox


yihua commented on pull request #4341:
URL: https://github.com/apache/hudi/pull/4341#issuecomment-1003520345


   Merging this since it only changes the error log.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yihua merged pull request #4341: [HUDI-3040] Fix HoodieSparkBootstrapExample error info for usage

2021-12-31 Thread GitBox


yihua merged pull request #4341:
URL: https://github.com/apache/hudi/pull/4341


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated: [HUDI-3040] Fix HoodieSparkBootstrapExample error info for usage (#4341)

2021-12-31 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new bfa169d  [HUDI-3040] Fix HoodieSparkBootstrapExample error info for 
usage (#4341)
bfa169d is described below

commit bfa169d808c72d09e1370a2b1ecd6a080d45fe02
Author: Aimiyoo 
AuthorDate: Sat Jan 1 15:38:38 2022 +0800

[HUDI-3040] Fix HoodieSparkBootstrapExample error info for usage (#4341)
---
 .../org/apache/hudi/examples/spark/HoodieSparkBootstrapExample.java | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/hudi-examples/src/main/java/org/apache/hudi/examples/spark/HoodieSparkBootstrapExample.java
 
b/hudi-examples/src/main/java/org/apache/hudi/examples/spark/HoodieSparkBootstrapExample.java
index e385e47..d11d2ed 100644
--- 
a/hudi-examples/src/main/java/org/apache/hudi/examples/spark/HoodieSparkBootstrapExample.java
+++ 
b/hudi-examples/src/main/java/org/apache/hudi/examples/spark/HoodieSparkBootstrapExample.java
@@ -40,7 +40,7 @@ public class HoodieSparkBootstrapExample {
 
   public static void main(String[] args) throws Exception {
 if (args.length < 5) {
-  System.err.println("Usage: HoodieWriteClientExample  
");
+  System.err.println("Usage: HoodieSparkBootstrapExample  
   ");
   System.exit(1);
 }
 String recordKey = args[0];


[jira] [Created] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0

2021-12-31 Thread leesf (Jira)
leesf created HUDI-3134:
---

 Summary: Fix Insert error after adding columns on Spark 3.2.0
 Key: HUDI-3134
 URL: https://issues.apache.org/jira/browse/HUDI-3134
 Project: Apache Hudi
  Issue Type: Sub-task
Reporter: leesf
Assignee: leesf


On Spark 3.2.0, after altering table to add columns, the insert statement will 
fail with the following exception.

Caused by: org.apache.hudi.exception.HoodieException: 
java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
  at 
org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
  ... 31 more
Caused by: java.util.concurrent.ExecutionException: 
org.apache.hudi.exception.HoodieException: operation has failed
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
  ... 32 more
Caused by: org.apache.hudi.exception.HoodieException: operation has failed
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  ... 3 more
Caused by: java.lang.NoSuchMethodError: 
org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode;
  at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168)
  at 
org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
  at 
org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
  at 
org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
  at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
  at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
  at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
  at 
org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
  at 
org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
  at 
org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  ... 4 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] yihua merged pull request #4487: [MINOR] - Updated articles doc page through 2021

2021-12-31 Thread GitBox


yihua merged pull request #4487:
URL: https://github.com/apache/hudi/pull/4487


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch asf-site updated: [MINOR] Update articles doc page through 2021 (#4487)

2021-12-31 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e133699  [MINOR] Update articles doc page through 2021 (#4487)
e133699 is described below

commit e13369995e45c6f41ee53fb5afa284ca933e79f4
Author: Kyle Weller 
AuthorDate: Fri Dec 31 23:44:20 2021 -0800

[MINOR] Update articles doc page through 2021 (#4487)
---
 website/src/pages/talks-articles.md | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/website/src/pages/talks-articles.md 
b/website/src/pages/talks-articles.md
index 48367de..5b8ffab 100644
--- a/website/src/pages/talks-articles.md
+++ b/website/src/pages/talks-articles.md
@@ -125,4 +125,10 @@ You can check out [our blog 
pages](https://hudi.apache.org/blog.html) for conten
 29. ["Cost-Efficient Open Source Big Data Platform at 
Uber"](https://eng.uber.com/cost-efficient-big-data-platform/) - By Zheng Shao 
and Mohammad Islam. Aug, 2021
 30. ["Data Platform 2.0 - Part 
I"](https://blogs.halodoc.io/data-platform-2-0-part-1/) - By Jitendra Shah. Oct 
5, 2021
 31. ["How Amazon Transportation Service enabled near-real-time event analytics 
at petabyte scale using AWS Glue with Apache Hudi"](
-
https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/)
 - Madhavan Sriram, Diego Menin, Gabriele Cacciola, and Kunal Gautam. Oct 14, 
2021
\ No newline at end of file
+
https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/)
 - Madhavan Sriram, Diego Menin, Gabriele Cacciola, and Kunal Gautam. Oct 14, 
2021
+32. ["Practice of Apache Hudi in building real-time data lake at station 
B"](https://developpaper.com/practice-of-apache-hudi-in-building-real-time-data-lake-at-station-b/)
 by Yu Zhaojing. Oct 21, 2021
+33. ["How GE Aviation built cloud-native data pipelines at enterprise scale 
using the AWS 
platform"](https://aws.amazon.com/blogs/big-data/how-ge-aviation-built-cloud-native-data-pipelines-at-enterprise-scale-using-the-aws-platform/)
 by Alcuin Weidus and Suresh Patnam. Nov 16, 2021
+34. 
["https://www.xenonstack.com/insights/what-is-hudi";](https://www.xenonstack.com/insights/what-is-hudi)
 by Chandan Gaur. Nov 22, 2021
+35. 
["https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-0-7-0-and-0-8-0-available-on-amazon-emr/";](https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-0-7-0-and-0-8-0-available-on-amazon-emr/)
 by Udit Mehotra and Gagan Brahmi. Dec 20, 2021
+36. ["Designing the Analytics patterns using a Lake House approach on 
AWS"](https://dev.to/aws-builders/designing-the-analytics-patterns-using-a-lake-house-approach-on-aws-2hh6)
 by Adit Modi. Dec 30, 2021
+37. ["The Art of Building Open Data Lakes with Apache Hudi, Kafka, Hive, and 
Debezium"](https://garystafford.medium.com/the-art-of-building-open-data-lakes-with-apache-hudi-kafka-hive-and-debezium-3d2f71c5981f)
 by Gary Stafford. Dec 31, 2021
\ No newline at end of file


[GitHub] [hudi] leesf opened a new pull request #4488: [HUDI-3134] Fix insert error after adding columns on Spark 3.2.0

2021-12-31 Thread GitBox


leesf opened a new pull request #4488:
URL: https://github.com/apache/hudi/pull/4488


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-3134) Fix Insert error after adding columns on Spark 3.2.0

2021-12-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-3134:
-
Labels: pull-request-available  (was: )

> Fix Insert error after adding columns on Spark 3.2.0
> 
>
> Key: HUDI-3134
> URL: https://issues.apache.org/jira/browse/HUDI-3134
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: leesf
>Assignee: leesf
>Priority: Major
>  Labels: pull-request-available
>
> On Spark 3.2.0, after altering table to add columns, the insert statement 
> will fail with the following exception.
> Caused by: org.apache.hudi.exception.HoodieException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:147)
>   at 
> org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:100)
>   ... 31 more
> Caused by: java.util.concurrent.ExecutionException: 
> org.apache.hudi.exception.HoodieException: operation has failed
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.execute(BoundedInMemoryExecutor.java:141)
>   ... 32 more
> Caused by: org.apache.hudi.exception.HoodieException: operation has failed
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   ... 3 more
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.avro.Schema$Field.defaultValue()Lorg/codehaus/jackson/JsonNode;
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:168)
>   at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>   at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>   at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
>   at 
> org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
>   at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>   at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>   at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>   at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   ... 4 more



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] leesf commented on a change in pull request #4488: [HUDI-3134] Fix insert error after adding columns on Spark 3.2.0

2021-12-31 Thread GitBox


leesf commented on a change in pull request #4488:
URL: https://github.com/apache/hudi/pull/4488#discussion_r777090621



##
File path: pom.xml
##
@@ -1612,7 +1614,7 @@
   
   
 
-  spark3
+  spark3.1.x

Review comment:
   fix the name conflict with spark3 profile, or it will use spark3.1.2 
when running `mvn clean package -Dscala-2.12 -Dspark3 -DskipTests`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4488: [HUDI-3134] Fix insert error after adding columns on Spark 3.2.0

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4488:
URL: https://github.com/apache/hudi/pull/4488#issuecomment-1003521227


   
   ## CI report:
   
   * 447998fa43f5a816ab01dd37da9c9ed4e0b9f11e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4488: [HUDI-3134] Fix insert error after adding columns on Spark 3.2.0

2021-12-31 Thread GitBox


hudi-bot removed a comment on pull request #4488:
URL: https://github.com/apache/hudi/pull/4488#issuecomment-1003521227


   
   ## CI report:
   
   * 447998fa43f5a816ab01dd37da9c9ed4e0b9f11e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4488: [HUDI-3134] Fix insert error after adding columns on Spark 3.2.0

2021-12-31 Thread GitBox


hudi-bot commented on pull request #4488:
URL: https://github.com/apache/hudi/pull/4488#issuecomment-1003521441


   
   ## CI report:
   
   * 447998fa43f5a816ab01dd37da9c9ed4e0b9f11e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4838)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-3119) KafkaConnect Always has a rollback transaction

2021-12-31 Thread Ethan Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467372#comment-17467372
 ] 

Ethan Guo commented on HUDI-3119:
-

This is related to HUDI-2735 and HUDI-2672.  Currently, if there are no new 
messages from the Kafka topic, the connector sink first makes a commit and then 
rolls it back, adding a new rollback in the timeline, which is expected at this 
point.  This is going to be fixed to avoid confusion for the users.

> KafkaConnect Always has a rollback transaction
> --
>
> Key: HUDI-3119
> URL: https://issues.apache.org/jira/browse/HUDI-3119
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: cdmikechen
>Priority: Critical
> Attachments: screenshot-1.png
>
>
> Transaction rollback often occurs during Kafka Connect is running. 
> This is part of the log where the rollback transaction occurred.
> {code}
> [2021-12-28 10:13:49,946] DEBUG WorkerSinkTask\{id=hudi-sink-0} Skipping 
> offset commit, no change since last commit 
> (org.apache.kafka.connect.runtime.WorkerSinkTask)
> [2021-12-28 10:13:49,946] DEBUG WorkerSinkTask\{id=hudi-sink-0} Finished 
> offset commit successfully in 0 ms for sequence number 49: null 
> (org.apache.kafka.connect.runtime.WorkerSinkTask)
> [2021-12-28 10:13:49,948] WARN Empty write statuses were received from all 
> Participants 
> (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator)
> [2021-12-28 10:13:49,948] WARN Current commit 20211228101151176 failed, so 
> starting a new commit after recovery delay 
> (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator)
> [2021-12-28 10:13:50,448] INFO AdminClientConfig values:
>     bootstrap.servers = [10.3.101.60:9092]
>     client.dns.lookup = use_all_dns_ips
>     client.id =
>     connections.max.idle.ms = 30
>     default.api.timeout.ms = 6
>     metadata.max.age.ms = 30
>     metric.reporters = []
>     metrics.num.samples = 2
>     metrics.recording.level = INFO
>     metrics.sample.window.ms = 3
>     receive.buffer.bytes = 65536
>     reconnect.backoff.max.ms = 1000
>     reconnect.backoff.ms = 50
>     request.timeout.ms = 3
>     retries = 2147483647
>     retry.backoff.ms = 100
>     sasl.client.callback.handler.class = null
>     sasl.jaas.config = null
>     sasl.kerberos.kinit.cmd = /usr/bin/kinit
>     sasl.kerberos.min.time.before.relogin = 6
>     sasl.kerberos.service.name = null
>     sasl.kerberos.ticket.renew.jitter = 0.05
>     sasl.kerberos.ticket.renew.window.factor = 0.8
>     sasl.login.callback.handler.class = null
>     sasl.login.class = null
>     sasl.login.refresh.buffer.seconds = 300
>     sasl.login.refresh.min.period.seconds = 60
>     sasl.login.refresh.window.factor = 0.8
>     sasl.login.refresh.window.jitter = 0.05
>     sasl.mechanism = GSSAPI
>     security.protocol = PLAINTEXT
>     security.providers = null
>     send.buffer.bytes = 131072
>     socket.connection.setup.timeout.max.ms = 127000
>     socket.connection.setup.timeout.ms = 1
>     ssl.cipher.suites = null
>     ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
>     ssl.endpoint.identification.algorithm = https
>     ssl.engine.factory.class = null
>     ssl.key.password = null
>     ssl.keymanager.algorithm = SunX509
>     ssl.keystore.certificate.chain = null
>     ssl.keystore.key = null
>     ssl.keystore.location = null
>     ssl.keystore.password = null
>     ssl.keystore.type = JKS
>     ssl.protocol = TLSv1.3
>     ssl.provider = null
>     ssl.secure.random.implementation = null
>     ssl.trustmanager.algorithm = PKIX
>     ssl.truststore.certificates = null
>     ssl.truststore.location = null
>     ssl.truststore.password = null
>     ssl.truststore.type = JKS
>  (org.apache.kafka.clients.admin.AdminClientConfig)
> [2021-12-28 10:13:50,450] INFO Kafka version: 6.1.1-ccs 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,450] INFO Kafka commitId: c209f70c6c2e52ae 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,450] INFO Kafka startTimeMs: 1640686430450 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,454] INFO Latest number of partitions for topic 
> hudi-test-topic is 4 (org.apache.hudi.connect.utils.KafkaConnectUtils)
> [2021-12-28 10:13:50,454] INFO Loading HoodieTableMetaClient from 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> [2021-12-28 10:13:50,464] INFO Loading table properties from 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic/.hoodie/hoodie.properties
>  (org.apache.hudi.common.table.HoodieTableConfig)
> [2021-12-28 10:13:50,469] INFO Finished Loading Table of type 
> MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
> hdfs://hdp-syzh

[jira] [Commented] (HUDI-3119) KafkaConnect Always has a rollback transaction

2021-12-31 Thread Ethan Guo (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17467373#comment-17467373
 ] 

Ethan Guo commented on HUDI-3119:
-

Closing this as a duplicate of HUDI-2672 and tracking the progress in HUDI-2735.

> KafkaConnect Always has a rollback transaction
> --
>
> Key: HUDI-3119
> URL: https://issues.apache.org/jira/browse/HUDI-3119
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: cdmikechen
>Priority: Critical
> Attachments: screenshot-1.png
>
>
> Transaction rollback often occurs during Kafka Connect is running. 
> This is part of the log where the rollback transaction occurred.
> {code}
> [2021-12-28 10:13:49,946] DEBUG WorkerSinkTask\{id=hudi-sink-0} Skipping 
> offset commit, no change since last commit 
> (org.apache.kafka.connect.runtime.WorkerSinkTask)
> [2021-12-28 10:13:49,946] DEBUG WorkerSinkTask\{id=hudi-sink-0} Finished 
> offset commit successfully in 0 ms for sequence number 49: null 
> (org.apache.kafka.connect.runtime.WorkerSinkTask)
> [2021-12-28 10:13:49,948] WARN Empty write statuses were received from all 
> Participants 
> (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator)
> [2021-12-28 10:13:49,948] WARN Current commit 20211228101151176 failed, so 
> starting a new commit after recovery delay 
> (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator)
> [2021-12-28 10:13:50,448] INFO AdminClientConfig values:
>     bootstrap.servers = [10.3.101.60:9092]
>     client.dns.lookup = use_all_dns_ips
>     client.id =
>     connections.max.idle.ms = 30
>     default.api.timeout.ms = 6
>     metadata.max.age.ms = 30
>     metric.reporters = []
>     metrics.num.samples = 2
>     metrics.recording.level = INFO
>     metrics.sample.window.ms = 3
>     receive.buffer.bytes = 65536
>     reconnect.backoff.max.ms = 1000
>     reconnect.backoff.ms = 50
>     request.timeout.ms = 3
>     retries = 2147483647
>     retry.backoff.ms = 100
>     sasl.client.callback.handler.class = null
>     sasl.jaas.config = null
>     sasl.kerberos.kinit.cmd = /usr/bin/kinit
>     sasl.kerberos.min.time.before.relogin = 6
>     sasl.kerberos.service.name = null
>     sasl.kerberos.ticket.renew.jitter = 0.05
>     sasl.kerberos.ticket.renew.window.factor = 0.8
>     sasl.login.callback.handler.class = null
>     sasl.login.class = null
>     sasl.login.refresh.buffer.seconds = 300
>     sasl.login.refresh.min.period.seconds = 60
>     sasl.login.refresh.window.factor = 0.8
>     sasl.login.refresh.window.jitter = 0.05
>     sasl.mechanism = GSSAPI
>     security.protocol = PLAINTEXT
>     security.providers = null
>     send.buffer.bytes = 131072
>     socket.connection.setup.timeout.max.ms = 127000
>     socket.connection.setup.timeout.ms = 1
>     ssl.cipher.suites = null
>     ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
>     ssl.endpoint.identification.algorithm = https
>     ssl.engine.factory.class = null
>     ssl.key.password = null
>     ssl.keymanager.algorithm = SunX509
>     ssl.keystore.certificate.chain = null
>     ssl.keystore.key = null
>     ssl.keystore.location = null
>     ssl.keystore.password = null
>     ssl.keystore.type = JKS
>     ssl.protocol = TLSv1.3
>     ssl.provider = null
>     ssl.secure.random.implementation = null
>     ssl.trustmanager.algorithm = PKIX
>     ssl.truststore.certificates = null
>     ssl.truststore.location = null
>     ssl.truststore.password = null
>     ssl.truststore.type = JKS
>  (org.apache.kafka.clients.admin.AdminClientConfig)
> [2021-12-28 10:13:50,450] INFO Kafka version: 6.1.1-ccs 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,450] INFO Kafka commitId: c209f70c6c2e52ae 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,450] INFO Kafka startTimeMs: 1640686430450 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,454] INFO Latest number of partitions for topic 
> hudi-test-topic is 4 (org.apache.hudi.connect.utils.KafkaConnectUtils)
> [2021-12-28 10:13:50,454] INFO Loading HoodieTableMetaClient from 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> [2021-12-28 10:13:50,464] INFO Loading table properties from 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic/.hoodie/hoodie.properties
>  (org.apache.hudi.common.table.HoodieTableConfig)
> [2021-12-28 10:13:50,469] INFO Finished Loading Table of type 
> MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> [2021-12-28 10:13:50,469] INFO Loading Active commit timeline for 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.d

[jira] [Closed] (HUDI-3119) KafkaConnect Always has a rollback transaction

2021-12-31 Thread Ethan Guo (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Guo closed HUDI-3119.
---
Resolution: Duplicate

> KafkaConnect Always has a rollback transaction
> --
>
> Key: HUDI-3119
> URL: https://issues.apache.org/jira/browse/HUDI-3119
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: cdmikechen
>Priority: Critical
> Attachments: screenshot-1.png
>
>
> Transaction rollback often occurs during Kafka Connect is running. 
> This is part of the log where the rollback transaction occurred.
> {code}
> [2021-12-28 10:13:49,946] DEBUG WorkerSinkTask\{id=hudi-sink-0} Skipping 
> offset commit, no change since last commit 
> (org.apache.kafka.connect.runtime.WorkerSinkTask)
> [2021-12-28 10:13:49,946] DEBUG WorkerSinkTask\{id=hudi-sink-0} Finished 
> offset commit successfully in 0 ms for sequence number 49: null 
> (org.apache.kafka.connect.runtime.WorkerSinkTask)
> [2021-12-28 10:13:49,948] WARN Empty write statuses were received from all 
> Participants 
> (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator)
> [2021-12-28 10:13:49,948] WARN Current commit 20211228101151176 failed, so 
> starting a new commit after recovery delay 
> (org.apache.hudi.connect.transaction.ConnectTransactionCoordinator)
> [2021-12-28 10:13:50,448] INFO AdminClientConfig values:
>     bootstrap.servers = [10.3.101.60:9092]
>     client.dns.lookup = use_all_dns_ips
>     client.id =
>     connections.max.idle.ms = 30
>     default.api.timeout.ms = 6
>     metadata.max.age.ms = 30
>     metric.reporters = []
>     metrics.num.samples = 2
>     metrics.recording.level = INFO
>     metrics.sample.window.ms = 3
>     receive.buffer.bytes = 65536
>     reconnect.backoff.max.ms = 1000
>     reconnect.backoff.ms = 50
>     request.timeout.ms = 3
>     retries = 2147483647
>     retry.backoff.ms = 100
>     sasl.client.callback.handler.class = null
>     sasl.jaas.config = null
>     sasl.kerberos.kinit.cmd = /usr/bin/kinit
>     sasl.kerberos.min.time.before.relogin = 6
>     sasl.kerberos.service.name = null
>     sasl.kerberos.ticket.renew.jitter = 0.05
>     sasl.kerberos.ticket.renew.window.factor = 0.8
>     sasl.login.callback.handler.class = null
>     sasl.login.class = null
>     sasl.login.refresh.buffer.seconds = 300
>     sasl.login.refresh.min.period.seconds = 60
>     sasl.login.refresh.window.factor = 0.8
>     sasl.login.refresh.window.jitter = 0.05
>     sasl.mechanism = GSSAPI
>     security.protocol = PLAINTEXT
>     security.providers = null
>     send.buffer.bytes = 131072
>     socket.connection.setup.timeout.max.ms = 127000
>     socket.connection.setup.timeout.ms = 1
>     ssl.cipher.suites = null
>     ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
>     ssl.endpoint.identification.algorithm = https
>     ssl.engine.factory.class = null
>     ssl.key.password = null
>     ssl.keymanager.algorithm = SunX509
>     ssl.keystore.certificate.chain = null
>     ssl.keystore.key = null
>     ssl.keystore.location = null
>     ssl.keystore.password = null
>     ssl.keystore.type = JKS
>     ssl.protocol = TLSv1.3
>     ssl.provider = null
>     ssl.secure.random.implementation = null
>     ssl.trustmanager.algorithm = PKIX
>     ssl.truststore.certificates = null
>     ssl.truststore.location = null
>     ssl.truststore.password = null
>     ssl.truststore.type = JKS
>  (org.apache.kafka.clients.admin.AdminClientConfig)
> [2021-12-28 10:13:50,450] INFO Kafka version: 6.1.1-ccs 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,450] INFO Kafka commitId: c209f70c6c2e52ae 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,450] INFO Kafka startTimeMs: 1640686430450 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2021-12-28 10:13:50,454] INFO Latest number of partitions for topic 
> hudi-test-topic is 4 (org.apache.hudi.connect.utils.KafkaConnectUtils)
> [2021-12-28 10:13:50,454] INFO Loading HoodieTableMetaClient from 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> [2021-12-28 10:13:50,464] INFO Loading table properties from 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic/.hoodie/hoodie.properties
>  (org.apache.hudi.common.table.HoodieTableConfig)
> [2021-12-28 10:13:50,469] INFO Finished Loading Table of type 
> MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> [2021-12-28 10:13:50,469] INFO Loading Active commit timeline for 
> hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi_test_topic 
> (org.apache.hudi.common.table.HoodieTableMetaClient)
> [2021-12-28 10:13:50,474] INFO Loaded i