[GitHub] [hudi] leesf commented on a change in pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-08 Thread GitBox


leesf commented on a change in pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#discussion_r764625760



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -441,9 +441,10 @@ public void refreshTimeline() throws IOException {
   return null;
 }
 
+jssc.setJobGroup(this.getClass().getSimpleName(), "Checking if input is 
empty");
 if ((!avroRDDOptional.isPresent()) || (avroRDDOptional.get().isEmpty())) {
   LOG.info("No new data, perform empty commit.");
-  return Pair.of(schemaProvider, Pair.of(checkpointStr, jssc.emptyRDD()));
+  return Pair.of(schemaProvider, Pair.of(checkpointStr, null));

Review comment:
   why do we need modify this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf closed pull request #4193: [HUDI-2915] Fix field not found error for sparksql

2021-12-08 Thread GitBox


leesf closed pull request #4193:
URL: https://github.com/apache/hudi/pull/4193


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4246: [MINOR] Update DOAP with 0.10.0 Release

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4246:
URL: https://github.com/apache/hudi/pull/4246#issuecomment-988561713


   
   ## CI report:
   
   * b40dde1704cd0c69ec1981bfd411e64bd46831a4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4087)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4246: [MINOR] Update DOAP with 0.10.0 Release

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4246:
URL: https://github.com/apache/hudi/pull/4246#issuecomment-988584778


   
   ## CI report:
   
   * b40dde1704cd0c69ec1981bfd411e64bd46831a4 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4087)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] YuweiXiao commented on a change in pull request #4222: [HUDI-2849] improve SparkUI job description for write path

2021-12-08 Thread GitBox


YuweiXiao commented on a change in pull request #4222:
URL: https://github.com/apache/hudi/pull/4222#discussion_r764627374



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -441,9 +441,10 @@ public void refreshTimeline() throws IOException {
   return null;
 }
 
+jssc.setJobGroup(this.getClass().getSimpleName(), "Checking if input is 
empty");
 if ((!avroRDDOptional.isPresent()) || (avroRDDOptional.get().isEmpty())) {
   LOG.info("No new data, perform empty commit.");
-  return Pair.of(schemaProvider, Pair.of(checkpointStr, jssc.emptyRDD()));
+  return Pair.of(schemaProvider, Pair.of(checkpointStr, null));

Review comment:
   It saves one call to isEmpty() in `DeltaSync::writeToSink`, which could 
further eliminate one spark job.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] kywe665 opened a new pull request #4250: [HUDI-2956] - Updating write docs for deletes and full write path description

2021-12-08 Thread GitBox


kywe665 opened a new pull request #4250:
URL: https://github.com/apache/hudi/pull/4250


   ## What is the purpose of the pull request
   
   Added more details for deletes and described high level full write path as 
described in this deep dive: https://www.youtube.com/watch?v=N2eDfU_rQ_U
   
   ## Verify this pull request
   
   Docs change only
   
   ## Committer checklist
   
- [X] Has a corresponding JIRA in PR title & commit

- [X] Commit message is descriptive of the change

- [X] CI is green
   
- [X] Necessary doc changes done or have another open PR
  
- [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2956) Improve Write docs

2021-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2956:
-
Labels: pull-request-available  (was: )

> Improve Write docs
> --
>
> Key: HUDI-2956
> URL: https://issues.apache.org/jira/browse/HUDI-2956
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Kyle Weller
>Assignee: Kyle Weller
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> add each substep of writing from 
> https://docs.google.com/presentation/d/1GpJ27IVtefqLbcGMvVDKDfoNMHv2-CFNc2-0BoAh3Ik/edit#slide=id.g8d35d881f3_0_58



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HUDI-2956) Improve Write docs

2021-12-08 Thread Kyle Weller (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weller updated HUDI-2956:
--
Status: Patch Available  (was: In Progress)

> Improve Write docs
> --
>
> Key: HUDI-2956
> URL: https://issues.apache.org/jira/browse/HUDI-2956
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Kyle Weller
>Assignee: Kyle Weller
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> add each substep of writing from 
> https://docs.google.com/presentation/d/1GpJ27IVtefqLbcGMvVDKDfoNMHv2-CFNc2-0BoAh3Ik/edit#slide=id.g8d35d881f3_0_58



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] guanlisheng commented on issue #4055: [SUPPORT] Hudi with SqlQueryBasedTransformer fails-> spark error exit 134 or exit 143 in "isEmpty at DeltaSync.java:344" : Container from a bad n

2021-12-08 Thread GitBox


guanlisheng commented on issue #4055:
URL: https://github.com/apache/hudi/issues/4055#issuecomment-988592379


   Hi there, 
   I have an identical issue when enabling my customized transformer class in 
Hudi 7.0 on EMR.  the transformer class is performing `mapPartitions` operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4245: [MINOR] remove unuse construction method

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4245:
URL: https://github.com/apache/hudi/pull/4245#issuecomment-988581448


   
   ## CI report:
   
   * 6f4e9f5fd7387cc3ec4dfa8d7f7a83a3abcbd0c0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4088)
 
   * 2b1b0fcc6c35bd83846bf0914babc2199165f5c5 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4245: [MINOR] remove unuse construction method

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4245:
URL: https://github.com/apache/hudi/pull/4245#issuecomment-988610403


   
   ## CI report:
   
   * 6f4e9f5fd7387cc3ec4dfa8d7f7a83a3abcbd0c0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4088)
 
   * 2b1b0fcc6c35bd83846bf0914babc2199165f5c5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4089)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2957) Shade kryo jar for flink bundle jar

2021-12-08 Thread Danny Chen (Jira)
Danny Chen created HUDI-2957:


 Summary: Shade kryo jar for flink bundle jar
 Key: HUDI-2957
 URL: https://issues.apache.org/jira/browse/HUDI-2957
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.11.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4245: [MINOR] remove unuse construction method

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4245:
URL: https://github.com/apache/hudi/pull/4245#issuecomment-988610403


   
   ## CI report:
   
   * 6f4e9f5fd7387cc3ec4dfa8d7f7a83a3abcbd0c0 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4088)
 
   * 2b1b0fcc6c35bd83846bf0914babc2199165f5c5 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4089)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4245: [MINOR] remove unuse construction method

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4245:
URL: https://github.com/apache/hudi/pull/4245#issuecomment-988641881


   
   ## CI report:
   
   * 2b1b0fcc6c35bd83846bf0914babc2199165f5c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4089)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Limess commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoint

2021-12-08 Thread GitBox


Limess commented on issue #4146:
URL: https://github.com/apache/hudi/issues/4146#issuecomment-988642069


   > @Limess : So, is the expectation that, if you set checkpoint = 0, 
deltastreamer should start from scratch as though we are starting deltastreamer 
for the first time ?
   
   Yes that's the expectation - this is also what currently happens, but where 
it doesn't match my expectations is that the subsequent commit is skipped so 
nothing is actually writen.
   
   Intuitively I'd expect that if the checkpoint < the commit timestamp, Hudi 
should always commit.
   
   > if you wish to override the checkpoint, probably you need to set 
--initial-checkpoint-provider.
   > 
   > 
https://github.com/apache/hudi/blob/e8473b9a2b5bf0ad9370377899f6a7ea4d1ceba1/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L357
   
   I did look into this and found it hard to understand/use, neither existing 
implementations match this use case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 opened a new pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


danny0405 opened a new pull request #4251:
URL: https://github.com/apache/hudi/pull/4251


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2957) Shade kryo jar for flink bundle jar

2021-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2957:
-
Labels: pull-request-available  (was: )

> Shade kryo jar for flink bundle jar
> ---
>
> Key: HUDI-2957
> URL: https://issues.apache.org/jira/browse/HUDI-2957
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988657584


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988657584


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988659621


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #4246: [MINOR] Update DOAP with 0.10.0 Release

2021-12-08 Thread GitBox


danny0405 commented on pull request #4246:
URL: https://github.com/apache/hudi/pull/4246#issuecomment-988661990


   The test failure should not be caused by this patch, so i would just merge 
it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 merged pull request #4246: [MINOR] Update DOAP with 0.10.0 Release

2021-12-08 Thread GitBox


danny0405 merged pull request #4246:
URL: https://github.com/apache/hudi/pull/4246


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[hudi] branch master updated (c9e18d1 -> c56d93e)

2021-12-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from c9e18d1  [HUDI-2942] add error message log in 
HoodieCombineHiveInputFormat (#4224)
 add c56d93e  [MINOR] Update DOAP with 0.10.0 Release (#4246)

No new revisions were added by this update.

Summary of changes:
 doap_HUDI.rdf | 5 +
 1 file changed, 5 insertions(+)


[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988659621


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988694462


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2958) Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert to insert data which contains decimal Type.

2021-12-08 Thread tao meng (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tao meng updated HUDI-2958:
---
Summary: Automatically set spark.sql.parquet.writelegacyformat; When using 
bulkinsert to insert data which contains decimal Type.  (was: Automatically set 
spark.sql.parquet.writelegacyformat. When using bulkinsert to insert data will 
contains decimal Type.)

> Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert 
> to insert data which contains decimal Type.
> 
>
> Key: HUDI-2958
> URL: https://issues.apache.org/jira/browse/HUDI-2958
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: tao meng
>Priority: Minor
> Fix For: 0.11.0
>
>
> Now by default ParquetWriteSupport will write DecimalType to parquet as 
> int32/int64 when the scale of decimalType < Decimal.MAX_LONG_DIGITS(),
> but AvroParquetReader which used by HoodieParquetReader cannot support read 
> int32/int64 as DecimalType. this will lead follow error
> Caused by: java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>     at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
>     at 
> org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
>     ..



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2958) Automatically set spark.sql.parquet.writelegacyformat. When using bulkinsert to insert data will contains decimal Type.

2021-12-08 Thread tao meng (Jira)
tao meng created HUDI-2958:
--

 Summary: Automatically set spark.sql.parquet.writelegacyformat. 
When using bulkinsert to insert data will contains decimal Type.
 Key: HUDI-2958
 URL: https://issues.apache.org/jira/browse/HUDI-2958
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Spark Integration
Reporter: tao meng
 Fix For: 0.11.0


Now by default ParquetWriteSupport will write DecimalType to parquet as 
int32/int64 when the scale of decimalType < Decimal.MAX_LONG_DIGITS(),
but AvroParquetReader which used by HoodieParquetReader cannot support read 
int32/int64 as DecimalType. this will lead follow error

Caused by: java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
    at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
    at 
org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
    ..



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2959) Fix the thread leak of cleaning service

2021-12-08 Thread Danny Chen (Jira)
Danny Chen created HUDI-2959:


 Summary: Fix the thread leak of cleaning service
 Key: HUDI-2959
 URL: https://issues.apache.org/jira/browse/HUDI-2959
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.11.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] danny0405 opened a new pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


danny0405 opened a new pull request #4252:
URL: https://github.com/apache/hudi/pull/4252


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


danny0405 commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988774462


   @vinothchandar , can you take a look, thanks so much ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2959) Fix the thread leak of cleaning service

2021-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2959:
-
Labels: pull-request-available  (was: )

> Fix the thread leak of cleaning service
> ---
>
> Key: HUDI-2959
> URL: https://issues.apache.org/jira/browse/HUDI-2959
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] danny0405 commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


danny0405 commented on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988775610


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988694462


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988776087


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988776048


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988776087


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-98863


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao opened a new pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-08 Thread GitBox


xiarixiaoyao opened a new pull request #4253:
URL: https://github.com/apache/hudi/pull/4253


   
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Now by default ParquetWriteSupport will write DecimalType to parquet as 
int32/int64 when the scale of decimalType < Decimal.MAX_LONG_DIGITS(),
   but AvroParquetReader which used by HoodieParquetReader cannot support read 
int32/int64 as DecimalType. this will lead follow error
   
   Caused by: java.lang.UnsupportedOperationException: 
org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
   at 
org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
   at 
org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
   ..
   
   we fixed this problem by auto Automatically set 
spark.sql.parquet.writelegacyformat.
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2958) Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert to insert data which contains decimal Type.

2021-12-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2958:
-
Labels: pull-request-available  (was: )

> Automatically set spark.sql.parquet.writelegacyformat; When using bulkinsert 
> to insert data which contains decimal Type.
> 
>
> Key: HUDI-2958
> URL: https://issues.apache.org/jira/browse/HUDI-2958
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Spark Integration
>Reporter: tao meng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.11.0
>
>
> Now by default ParquetWriteSupport will write DecimalType to parquet as 
> int32/int64 when the scale of decimalType < Decimal.MAX_LONG_DIGITS(),
> but AvroParquetReader which used by HoodieParquetReader cannot support read 
> int32/int64 as DecimalType. this will lead follow error
> Caused by: java.lang.UnsupportedOperationException: 
> org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionary
>     at org.apache.parquet.column.Dictionary.decodeToBinary(Dictionary.java:41)
>     at 
> org.apache.parquet.avro.AvroConverters$BinaryConverter.setDictionary(AvroConverters.java:75)
>     ..



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988783594


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093)
 
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-98863


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988787463


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093)
 
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-988787494


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988783594


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093)
 
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-988787494


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-988789343


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yuzhaojing opened a new pull request #4254: [HUDI-2537] Schedule Flink compaction in service

2021-12-08 Thread GitBox


yuzhaojing opened a new pull request #4254:
URL: https://github.com/apache/hudi/pull/4254


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4254: [HUDI-2537] Schedule Flink compaction in service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4254:
URL: https://github.com/apache/hudi/pull/4254#issuecomment-988805075


   
   ## CI report:
   
   * 59cdd6413be9d029f175e06e12db7893f75e7af7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4254: [HUDI-2537] Schedule Flink compaction in service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4254:
URL: https://github.com/apache/hudi/pull/4254#issuecomment-988805075


   
   ## CI report:
   
   * 59cdd6413be9d029f175e06e12db7893f75e7af7 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4254: [HUDI-2537] Schedule Flink compaction in service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4254:
URL: https://github.com/apache/hudi/pull/4254#issuecomment-988807064


   
   ## CI report:
   
   * 59cdd6413be9d029f175e06e12db7893f75e7af7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4095)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


danny0405 commented on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988814010


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988776048


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988815109


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4096)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988787463


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093)
 
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988825206


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093)
 
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-988825224


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4253: [HUDI-2958] Automatically set spark.sql.parquet.writelegacyformat, when using bulkinsert to insert data which contains decimalType

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4253:
URL: https://github.com/apache/hudi/pull/4253#issuecomment-988789343


   
   ## CI report:
   
   * 34dd491be3ce6d6f55627bbe3390fefbac674e8e Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4094)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-978211197


   
   ## CI report:
   
   * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988827020


   
   ## CI report:
   
   * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697)
 
   * 72ea77955da505b679945dc92ea0dd2d597bcedf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988828994


   
   ## CI report:
   
   * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697)
 
   * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988827020


   
   ## CI report:
   
   * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697)
 
   * 72ea77955da505b679945dc92ea0dd2d597bcedf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4254: [HUDI-2537] Schedule Flink compaction in service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4254:
URL: https://github.com/apache/hudi/pull/4254#issuecomment-988807064


   
   ## CI report:
   
   * 59cdd6413be9d029f175e06e12db7893f75e7af7 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4095)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4254: [HUDI-2537] Schedule Flink compaction in service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4254:
URL: https://github.com/apache/hudi/pull/4254#issuecomment-988849669


   
   ## CI report:
   
   * 59cdd6413be9d029f175e06e12db7893f75e7af7 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4095)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988815109


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4096)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4251: [HUDI-2957] Shade kryo jar for flink bundle jar

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4251:
URL: https://github.com/apache/hudi/pull/4251#issuecomment-988861254


   
   ## CI report:
   
   * 2a84f3eeef355177e32aebd62a8eb4ed2712a647 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4090)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4092)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4096)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally

2021-12-08 Thread GitBox


nsivabalan closed issue #2934:
URL: https://github.com/apache/hudi/issues/2934


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #2934: [SUPPORT] Parquet file does not exist when trying to read hudi table incrementally

2021-12-08 Thread GitBox


nsivabalan commented on issue #2934:
URL: https://github.com/apache/hudi/issues/2934#issuecomment-90486


   Will close out the ticket as this is expected with interplays between 
archival and incremental queries. and since we have a patch addressing it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan closed issue #3826: Deltastreamer not getting auto triggered in continuous mode

2021-12-08 Thread GitBox


nsivabalan closed issue #3826:
URL: https://github.com/apache/hudi/issues/3826


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #3826: Deltastreamer not getting auto triggered in continuous mode

2021-12-08 Thread GitBox


nsivabalan commented on issue #3826:
URL: https://github.com/apache/hudi/issues/3826#issuecomment-92722


   Can you please respond. This is very common use-case and many folks in the 
community have been running in continuous mode. So, some env specific or config 
issue. Closing it for now. Feel free to re-open if need be. happy to help. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988825206


   
   ## CI report:
   
   * b16a5686bd6a8a03aa2847624fa5cf3e2e9d36ec Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4093)
 
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-92409


   
   ## CI report:
   
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoi

2021-12-08 Thread GitBox


nsivabalan commented on issue #4146:
URL: https://github.com/apache/hudi/issues/4146#issuecomment-988891080


   gotcha. guess the way you set the checkpoint should work based on this code 
block
   ```
   if (cfg.checkpoint != null && 
(StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))
   || 
!cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY {
 resumeCheckpointStr = Option.of(cfg.checkpoint);
   ```
   
   Can you enable debug logs and post it here. In the mean time, I will try to 
reproduce this locally and will post an update here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoi

2021-12-08 Thread GitBox


nsivabalan commented on issue #4146:
URL: https://github.com/apache/hudi/issues/4146#issuecomment-988892449


   Can you try checkpoint = "val=0"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2960) create hudi table may cause memory leak in spark thrift server

2021-12-08 Thread suheng.cloud (Jira)
suheng.cloud created HUDI-2960:
--

 Summary: create hudi table may cause memory leak in spark thrift 
server
 Key: HUDI-2960
 URL: https://issues.apache.org/jira/browse/HUDI-2960
 Project: Apache Hudi
  Issue Type: Bug
  Components: Spark Integration
Affects Versions: 0.10.0
Reporter: suheng.cloud


Hi, community

I currently try to use spark-hudi integration in spark-thrift-server, and after 
test create hudi table for a while, I found it would finally result in 
META-SPACE OOM(in my case, jvm option -XX:MaxMetaspaceSize=256m assigned).

After track the source, I found that every time a CreateHoodieTableCommand 
performed, `HiveClientUtils.newClientForMetadata` will be invoked, thus a 
IsolatedClientLoader will be created, in my scene, the OOM will occured after 
about 10 create statement executed.

Why not use 
`sessionState.catalog.externalCatalog.asInstanceOf[ExternalCatalogWithListener].unwrapped.asInstanceOf[HiveExternalCatalog].client
 ` instead ? Does it has anything side effect?

env: hudi master/spark-3.1.2/hive-2.3.6

Thanks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] Limess commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoint

2021-12-08 Thread GitBox


Limess commented on issue #4146:
URL: https://github.com/apache/hudi/issues/4146#issuecomment-988898544


   > Can you try checkpoint = "val=0"
   
   To clarify, you're asking me to try  `--checkpoint "val=0"` using the 
Deltastreamer CLI?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Limess edited a comment on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit check

2021-12-08 Thread GitBox


Limess edited a comment on issue #4146:
URL: https://github.com/apache/hudi/issues/4146#issuecomment-988898544


   > Can you try checkpoint = "val=0"
   
   To clarify, you're asking me to try  `--checkpoint "val=0"` using the 
Deltastreamer CLI? Or `--checkpoint 0`?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Limess edited a comment on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit check

2021-12-08 Thread GitBox


Limess edited a comment on issue #4146:
URL: https://github.com/apache/hudi/issues/4146#issuecomment-988642069


   > @Limess : So, is the expectation that, if you set checkpoint = 0, 
deltastreamer should start from scratch as though we are starting deltastreamer 
for the first time ?
   
   Yes that's the expectation - this is also what currently happens, but where 
it doesn't match my expectations is that the subsequent commit is skipped so 
nothing is actually written.
   
   Intuitively I'd expect that if the checkpoint < the commit timestamp, Hudi 
should always commit.
   
   > if you wish to override the checkpoint, probably you need to set 
--initial-checkpoint-provider.
   > 
   > 
https://github.com/apache/hudi/blob/e8473b9a2b5bf0ad9370377899f6a7ea4d1ceba1/hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java#L357
   
   I did look into this and found it hard to understand/use, neither existing 
implementations match this use case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988899676


   
   ## CI report:
   
   * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988828994


   
   ## CI report:
   
   * 548c193ffe432033be61ca5a592f6d9760b5ebb0 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3697)
 
   * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


danny0405 commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988906058


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988907016


   
   ## CI report:
   
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4099)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-92409


   
   ## CI report:
   
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988899676


   
   ## CI report:
   
   * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988911467


   
   ## CI report:
   
   * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098)
 
   * a23ba86033f3215c9f57118742189ae844c6c850 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988911467


   
   ## CI report:
   
   * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098)
 
   * a23ba86033f3215c9f57118742189ae844c6c850 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988914045


   
   ## CI report:
   
   * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098)
 
   * a23ba86033f3215c9f57118742189ae844c6c850 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4100)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988947837


   
   ## CI report:
   
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4099)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4252: [HUDI-2959] Fix the thread leak of cleaning service

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4252:
URL: https://github.com/apache/hudi/pull/4252#issuecomment-988907016


   
   ## CI report:
   
   * 15574f13d95fd781239bfb81dcd7ecef8213f6c9 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4097)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4099)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoi

2021-12-08 Thread GitBox


nsivabalan commented on issue #4146:
URL: https://github.com/apache/hudi/issues/4146#issuecomment-988951531


   This worked for me. 
   First run
   ```
   nsb$ grep "Checkpoint" /tmp/logs/log23.out 
   21/12/08 07:47:52 INFO DeltaSync: Checkpoint to resume from : Optional.empty
   21/12/08 07:49:35 INFO DeltaSync: Checkpoint to resume from : 
Option{val=1638825407000}
   21/12/08 07:50:35 INFO DeltaSync: Checkpoint to resume from : 
Option{val=1638825408000}
   21/12/08 07:51:46 INFO DeltaSync: Checkpoint to resume from : 
Option{val=1638825413000}
   ```
   
   Stopped deltastreamer and restarted with additional config --checkpoint 
1638825407000
   
   ```
   nsb$ tail -f /tmp/logs/log24.out | grep "Checkpoint" 
   21/12/08 07:54:36 INFO DeltaSync: Checkpoint to resume from : 
Option{val=1638825407000}
   21/12/08 07:56:16 INFO DeltaSync: Checkpoint to resume from : 
Option{val=1638825408000}
   21/12/08 07:57:56 INFO DeltaSync: Checkpoint to resume from : 
Option{val=1638825413000}
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot removed a comment on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot removed a comment on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988914045


   
   ## CI report:
   
   * 72ea77955da505b679945dc92ea0dd2d597bcedf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4098)
 
   * a23ba86033f3215c9f57118742189ae844c6c850 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4100)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #4020: [WIP][HUDI-2783] Upgrade HBase to 2.x

2021-12-08 Thread GitBox


hudi-bot commented on pull request #4020:
URL: https://github.com/apache/hudi/pull/4020#issuecomment-988963081


   
   ## CI report:
   
   * a23ba86033f3215c9f57118742189ae844c6c850 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=4100)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] h7kanna commented on issue #4170: [SUPPORT] Understanding Clustering Behavior

2021-12-08 Thread GitBox


h7kanna commented on issue #4170:
URL: https://github.com/apache/hudi/issues/4170#issuecomment-988979839


   I have 
   hoodie.parquet.max.file.size=134217728
   hoodie.parquet.small.file.limit=104857600
   hoodie.clustering.plan.strategy.target.file.max.bytes=134217728
   hoodie.clustering.plan.strategy.small.file.limit=104857600
   
   In one of the partitions I could see 176 Mb file (more than max file limit)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] h7kanna edited a comment on issue #4170: [SUPPORT] Understanding Clustering Behavior

2021-12-08 Thread GitBox


h7kanna edited a comment on issue #4170:
URL: https://github.com/apache/hudi/issues/4170#issuecomment-988979839


   I have 
   hoodie.parquet.max.file.size=134217728
   hoodie.parquet.small.file.limit=67108864
   hoodie.clustering.plan.strategy.target.file.max.bytes=134217728
   hoodie.clustering.plan.strategy.small.file.limit=67108864
   
   In one of the partitions I could see 176 Mb file (more than max file limit)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar commented on a change in pull request #3173: [HUDI-1951] Add bucket hash index, compatible with the hive bucket

2021-12-08 Thread GitBox


vinothchandar commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r764911822



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java
##
@@ -122,13 +125,40 @@ public O updateLocation(O writeStatuses, 
HoodieEngineContext context,
   @PublicAPIMethod(maturity = ApiMaturityLevel.STABLE)
   public abstract boolean isImplicitWithStorage();
 
+  /**
+   * An index might need customized partitioner other than general upsert and 
insert partitioner.
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
+  public Option getCustomizedPartitioner(WorkloadProfile profile,
+  HoodieEngineContext context,
+  HoodieTable table,
+  HoodieWriteConfig writeConfig) {
+return Option.empty();
+  }
+
+  /**
+   * If the `getCustomizedPartitioner` returns a partitioner, it has to be 
true.
+   */
+  @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)

Review comment:
   instead of thinking of it as custom partitioner, I would prefer we 
introduce a notion of "storage layout". bucketing is not just an attribute of 
writing but storage itself. once bucketed, any writer/reader needs to respect 
that. 

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
##
@@ -200,6 +209,48 @@
   .defaultValue("true")
   .withDocumentation("Similar to " + 
BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE + ", but for simple index.");
 
+  /**
+   * * Bucket Index Configs *
+   * Bucket Index is targeted to locate the record fast by hash in big data 
scenarios.
+   * The current implementation is a basic version, so there are some 
constraints:
+   * 1. Unsupported operation: bulk insert, cluster and so on.
+   * 2. Bucket num change requires rewriting the partition.
+   * 3. Predict the table size and future data growth well to set a reasonable 
bucket num.
+   * 4. A bucket size is recommended less than 3GB and avoid bing too small.
+   */
+  // Bucket num equals file groups num in each partition.
+  // Bucket num can be set according to partition size and file group size.
+  public static final ConfigProperty BUCKET_INDEX_NUM_BUCKETS = 
ConfigProperty
+  .key("hoodie.bucket.index.num.buckets")
+  .defaultValue(256)
+  .withDocumentation("Only applies if index type is BUCKET_INDEX. 
Determine the bucket num of the hudi table, "
+  + "and each partition is divided to N buckets.");
+
+  public static final ConfigProperty BUCKET_INDEX_HASH_FIELD = 
ConfigProperty
+  .key("hoodie.bucket.index.hash.field")
+  .noDefaultValue()
+  .withDocumentation("Index key. It is used to index the record and find 
its file group. "
+  + "If not set, use record key field as default");
+
+  public static final ConfigProperty BUCKET_INDEX_HASH_FUNCTION = 
ConfigProperty
+  .key("hoodie.bucket.index.hash.function")
+  .defaultValue("JVMHash")

Review comment:
   we have standard utils for hashing now, that we intend to use broadly 
across. Can we reuse `HashID`. Do we need the HiveHash per se? I feel we should 
default to something other than JVMHash. wdyt

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java
##
@@ -200,6 +209,48 @@
   .defaultValue("true")
   .withDocumentation("Similar to " + 
BLOOM_INDEX_UPDATE_PARTITION_PATH_ENABLE + ", but for simple index.");
 
+  /**
+   * * Bucket Index Configs *
+   * Bucket Index is targeted to locate the record fast by hash in big data 
scenarios.
+   * The current implementation is a basic version, so there are some 
constraints:
+   * 1. Unsupported operation: bulk insert, cluster and so on.

Review comment:
   Right, while bucketing helps for write perf and also join performance 
for UUID joins per e.g, it goes against clustering and other layout 
optimizations that can be useful for query performance. This is one of the 
reasons I did not prefer baking bucketing into the storage design.

##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/BucketIdentifier.java
##
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the spec

[GitHub] [hudi] Limess commented on issue #4146: [SUPPORT] Deltastreamer commits with a custom checkpoint configuration are skipped if the generated checkpoint matches the previous commit checkpoint

2021-12-08 Thread GitBox


Limess commented on issue #4146:
URL: https://github.com/apache/hudi/issues/4146#issuecomment-989012890


   We're not running in continous mode, it looks like the above might be? We're 
also using DFS datasource.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2961) Async table services can race with metadata table updates

2021-12-08 Thread Manoj Govindassamy (Jira)
Manoj Govindassamy created HUDI-2961:


 Summary: Async table services can race with metadata table updates
 Key: HUDI-2961
 URL: https://issues.apache.org/jira/browse/HUDI-2961
 Project: Apache Hudi
  Issue Type: Task
  Components: Writer Core
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
 Fix For: 0.11.0


Today Metadata table updates are done inline/synchronous with the data table 
updates. Metadata data table updates can also sometime trigger table services 
like compaction which are also done inline w.r.t the ongoing commit. So, 
updates in the metadata table are always serial. However, there can be async 
table services like clustering which are running in parallel with single or 
multiple writers and can update the metadata table in parallel with the writer 
commits. 

In the multi writer case, since we anyway have the lock provider configured 
metadata table updates are guarded for race. But, the lock providers are not 
must today for single writer + async table service deployments, leading to race 
in metadata table updates. Async table service like clustering can race with 
the metadata table compaction, and can update the wrong delta log file than the 
right next delta file from the compaction.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [hudi] alexeykudinkin commented on a change in pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-08 Thread GitBox


alexeykudinkin commented on a change in pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#discussion_r765092480



##
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java
##
@@ -95,8 +95,7 @@ public MultipleSparkJobExecutionStrategy(HoodieTable table, 
HoodieEngineContext
 .map(inputGroup -> runClusteringForGroupAsync(inputGroup,
 clusteringPlan.getStrategy().getStrategyParams(),
 
Option.ofNullable(clusteringPlan.getPreserveHoodieMetadata()).orElse(false),
-instantTime))
-.map(CompletableFuture::join);
+
instantTime)).collect(Collectors.toList()).stream().map(CompletableFuture::join);

Review comment:
   How does this guarantee jobs will run in parallel? We simply dereference 
stream into list, but then still join the Futures sequentially.
   
   Instead we should use following util 
   
   ```
   public static  CompletableFuture> allOf(@Nonnull 
List> futures) {
   return CompletableFuture.allOf(futures.toArray(new 
CompletableFuture[0]))
   .thenApply(aVoid ->
   futures.stream()
   // NOTE: This join wouldn't block, since all the
   //   futures are completed at this point
   .map(CompletableFuture::join)
   .collect(Collectors.toList())
   );
   }
   ```
   
   And then invoke it like following 
   
   ```
   allOf(
 clusteringPlan.getInputGroups()
   .stream()
   .map(...) // returns `CompletableFuture`
   .collect(Collectors.toList())
   )
 .join();
   ```
   
   This would guarantee parallel execution for each individual clustering group




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] alexeykudinkin commented on pull request #4178: [HUDI-2901] Fixed the bug clustering jobs cannot running in parallel

2021-12-08 Thread GitBox


alexeykudinkin commented on pull request #4178:
URL: https://github.com/apache/hudi/pull/4178#issuecomment-989038686


   Great catch @xiarixiaoyao!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2962) Enable metadata table along with JVM local lock provider

2021-12-08 Thread Manoj Govindassamy (Jira)
Manoj Govindassamy created HUDI-2962:


 Summary: Enable metadata table along with JVM local lock provider
 Key: HUDI-2962
 URL: https://issues.apache.org/jira/browse/HUDI-2962
 Project: Apache Hudi
  Issue Type: Task
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
 Fix For: 0.11.0


Metadata table is disabled by default in master due to 
https://issues.apache.org/jira/browse/HUDI-2961. 

 

For the single writer + async table services deployment model, to protect 
against races, we can have a fairly light weight JVM local lock provider. This 
mean all the writes and the table services have to be running from the single 
JVM, like in the case of DeltaStreamer.  This doesn't cover the multi JVM 
writes, async table services though and a full fix for the same will be covered 
by HUDI-2961. For now to have the metadata table re-enabled at master, a JVM 
local locl provider should be sufficient. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HUDI-2963) Update configs for 0.10.0

2021-12-08 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2963:
-

 Summary: Update configs for 0.10.0 
 Key: HUDI-2963
 URL: https://issues.apache.org/jira/browse/HUDI-2963
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Docs
Reporter: sivabalan narayanan






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


  1   2   3   4   >