[ 
https://issues.apache.org/jira/browse/HUDI-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Istvan Darvas updated HUDI-3362:
--------------------------------
    Description: 
Hi Guys,

 

Environment: AWS EMR 6.4 / Hudi v0.8.0

Problem: I have a CoW table wich is ingested by DeltaStremer (batch style: 
every 5 minutes from Kafka), and after a certain time, DeltaStremer stops 
working with a message like this:

 

{{diagnostics: User class threw exception: 
org.apache.hudi.exception.HoodieRollbackException: Found commits after time 
:20220131215051, please rollback greater commits first}}

 

It is usually a replace commit, I would say I am pretty sure in this.

I have commits in the timeline:

 

20220131214354<-before
20220131215051<-error message
20220131215514<-after

 

So as it was told to me, I try to rollback with the following steps in hudi-cli:

1.) connect --path s3://scgps-datalake/iot_raw/ingress_pkg_decoded_rep / SUCCESS

2.) savepoint create --commit 20220131214354 --sparkMaster local[2] / SUCCESS

3.) savepoint rollback --savepoint 20220131214354 --sparkMaster local[2] / 
FAILED

4.) savepoint create --commit 20220131215514 --sparkMaster local[2] / SUCCESS

5.) savepoint rollback --savepoint 20220131215514 --sparkMaster local[2] / 
FAILED

 

Long story short, if I run a situation like this I am not able to solve it with 
the known methods ;) - My use-case is working progress, but I cannot go prod 
with an issue like this.

 

My question, what would be the right steps / commands to solve an issue like 
this, and be able to restart deltastremer again.

 

This table, does not have dimension data, so I am happy to share the whole 
table if someone curiuous (if that is needed or would be helpful, lets talk in 
a private mail / slack about the sharing). ~15GB  ;) it was stoped after a few 
run, actually after the 1st clustering.

 

I use this clustering config in the DeltaStremer:

hoodie.clustering.inline=true
hoodie.clustering.inline.enabled=true
hoodie.clustering.inline.max.commits=36
hoodie.clustering.plan.strategy.sort.columns=correlation_id
hoodie.clustering.plan.strategy.daybased.lookback.partitions=7
hoodie.clustering.plan.strategy.target.file.max.bytes=268435456
hoodie.clustering.plan.strategy.small.file.limit=134217728
hoodie.clustering.plan.strategy.max.bytes.per.group=671088640

 

I hope there is someone who can help me to tackle with this, becase if I able 
to solve this manually, I would be confident to go prod.

So thanks in advance,

Darvi

Slack Hudi: istvan darvas / U02NTACPHPU

  was:
Hi Guys,

 

Environment: AWS EMR 6.4 / Hudi v0.8.0

Problem: I have a MoR table wich is ingested by DeltaStremer (batch style: 
every 5 minutes from Kafka), and after a certain time, DeltaStremer stops 
working with a message like this:

 

{{diagnostics: User class threw exception: 
org.apache.hudi.exception.HoodieRollbackException: Found commits after time 
:20220131215051, please rollback greater commits first}}

 

It is usually a replace commit, I would say I am pretty sure in this.

I have commits in the timeline:

 

20220131214354<-before
20220131215051<-error message
20220131215514<-after

 

So as it was told to me, I try to rollback with the following steps in hudi-cli:

1.) connect --path s3://scgps-datalake/iot_raw/ingress_pkg_decoded_rep / SUCCESS

2.) savepoint create --commit 20220131214354 --sparkMaster local[2] / SUCCESS

3.) savepoint rollback --savepoint 20220131214354 --sparkMaster local[2] / 
FAILED

4.) savepoint create --commit 20220131215514 --sparkMaster local[2] / SUCCESS

5.) savepoint rollback --savepoint 20220131215514 --sparkMaster local[2] / 
FAILED

 

Long story short, if I run a situation like this I am not able to solve it with 
the known methods ;) - My use-case is working progress, but I cannot go prod 
with an issue like this.

 

My question, what would be the right steps / commands to solve an issue like 
this, and be able to restart deltastremer again.

 

This table, does not have dimension data, so I am happy to share the whole 
table if someone curiuous (if that is needed or would be helpful, lets talk in 
a private mail / slack about the sharing). ~15GB  ;) it was stoped after a few 
run, actually after the 1st clustering.

 

I use this clustering config in the DeltaStremer:

hoodie.clustering.inline=true
hoodie.clustering.inline.enabled=true
hoodie.clustering.inline.max.commits=36
hoodie.clustering.plan.strategy.sort.columns=correlation_id
hoodie.clustering.plan.strategy.daybased.lookback.partitions=7
hoodie.clustering.plan.strategy.target.file.max.bytes=268435456
hoodie.clustering.plan.strategy.small.file.limit=134217728
hoodie.clustering.plan.strategy.max.bytes.per.group=671088640

 

I hope there is someone who can help me to tackle with this, becase if I able 
to solve this manually, I would be confident to go prod.

So thanks in advance,

Darvi

Slack Hudi: istvan darvas / U02NTACPHPU


> Hudi 0.8.0 cannot roleback CoW table
> ------------------------------------
>
>                 Key: HUDI-3362
>                 URL: https://issues.apache.org/jira/browse/HUDI-3362
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Istvan Darvas
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>         Attachments: hoodie.zip, rollback-on-a-not-damaged-table-SUCCESS.pdf, 
> rollback-on-a-not-damaged-table-SUCCESS.txt, rollback_20220131215514.txt, 
> rollback_log.txt, rollback_log_v2.txt
>
>
> Hi Guys,
>  
> Environment: AWS EMR 6.4 / Hudi v0.8.0
> Problem: I have a CoW table wich is ingested by DeltaStremer (batch style: 
> every 5 minutes from Kafka), and after a certain time, DeltaStremer stops 
> working with a message like this:
>  
> {{diagnostics: User class threw exception: 
> org.apache.hudi.exception.HoodieRollbackException: Found commits after time 
> :20220131215051, please rollback greater commits first}}
>  
> It is usually a replace commit, I would say I am pretty sure in this.
> I have commits in the timeline:
>  
> 20220131214354<-before
> 20220131215051<-error message
> 20220131215514<-after
>  
> So as it was told to me, I try to rollback with the following steps in 
> hudi-cli:
> 1.) connect --path s3://scgps-datalake/iot_raw/ingress_pkg_decoded_rep / 
> SUCCESS
> 2.) savepoint create --commit 20220131214354 --sparkMaster local[2] / SUCCESS
> 3.) savepoint rollback --savepoint 20220131214354 --sparkMaster local[2] / 
> FAILED
> 4.) savepoint create --commit 20220131215514 --sparkMaster local[2] / SUCCESS
> 5.) savepoint rollback --savepoint 20220131215514 --sparkMaster local[2] / 
> FAILED
>  
> Long story short, if I run a situation like this I am not able to solve it 
> with the known methods ;) - My use-case is working progress, but I cannot go 
> prod with an issue like this.
>  
> My question, what would be the right steps / commands to solve an issue like 
> this, and be able to restart deltastremer again.
>  
> This table, does not have dimension data, so I am happy to share the whole 
> table if someone curiuous (if that is needed or would be helpful, lets talk 
> in a private mail / slack about the sharing). ~15GB  ;) it was stoped after a 
> few run, actually after the 1st clustering.
>  
> I use this clustering config in the DeltaStremer:
> hoodie.clustering.inline=true
> hoodie.clustering.inline.enabled=true
> hoodie.clustering.inline.max.commits=36
> hoodie.clustering.plan.strategy.sort.columns=correlation_id
> hoodie.clustering.plan.strategy.daybased.lookback.partitions=7
> hoodie.clustering.plan.strategy.target.file.max.bytes=268435456
> hoodie.clustering.plan.strategy.small.file.limit=134217728
> hoodie.clustering.plan.strategy.max.bytes.per.group=671088640
>  
> I hope there is someone who can help me to tackle with this, becase if I able 
> to solve this manually, I would be confident to go prod.
> So thanks in advance,
> Darvi
> Slack Hudi: istvan darvas / U02NTACPHPU



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to