[ 
https://issues.apache.org/jira/browse/SPARK-51717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-51717.
----------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 50512
[https://github.com/apache/spark/pull/50512]

> Possible SST mismatch error for the second snapshot created for a new query
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-51717
>                 URL: https://issues.apache.org/jira/browse/SPARK-51717
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 4.0.0, 4.1.0
>            Reporter: B. Micheal Okutubo
>            Assignee: B. Micheal Okutubo
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> Fix this error: Sst file size mismatch ... MANIFEST-000005 may be corrupted
> An edge case in SST file reuse that can only happen for the first ever 
> RocksDB checkpoint if:
>  # The first ever RocksDB checkpoint (e.g. for version 10) was created with 
> x.sst, but not yet upload by maintenance
>  # The next batch using RocksDB at v10 fails and rolls back store to -1 
> (invalidates RocksDB)
>  # A new request to load RocksDB at v10 comes in, but v10 checkpoint is still 
> not uploaded hence we have to start replaying changelog starting from 
> checkpoint v0.
>  # We create a new v11 and new checkpoint with new x*.sst. v10 is now 
> uploaded by maintenance. Then during upload of x*.sst for v11, we reuse x.sst 
> DFS file, thinking it is the same as x*.sst.
> The problem here is from step 3, the way the file manager loads v0 is 
> different from how it loads other versions. During the load of other 
> versions, when we delete an existing local file we also delete it from file 
> mapping. But for v0, file manager just deletes the local dir and we missed 
> clearing the file mapping in this case. Hence the old x.sst was still showing 
> in the file mapping at step 4. We need to fix this and also add additional 
> size check.
>  
> Only when using changelog checkpointing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to