micheal-o opened a new pull request, #50512:
URL: https://github.com/apache/spark/pull/50512

   ### What changes were proposed in this pull request?
   Fix error: Sst file size mismatch ... MANIFEST-000005 may be corrupted. 
   
   This is an edge case in SST file reuse that can only happen for the first 
ever RocksDB checkpoint if the following conditions happen:
   
   1. The first ever RocksDB checkpoint (e.g. for version 10) was created with 
x.sst, but not yet upload by maintenance
   2. The next batch using RocksDB at v10 fails and rolls back store to -1 
(invalidates RocksDB)
   3. A new request to load RocksDB at v10 comes in, but v10 checkpoint is 
still not uploaded hence we have to start replaying changelog starting from 
checkpoint v0.
   4. We create a new v11 and new checkpoint with new x*.sst. v10 is now 
uploaded by maintenance. Then during upload of x*.sst for v11, we reuse x.sst 
DFS file, thinking it is the same as x*.sst.
   
   The problem here is from step 3, the way the file manager loads v0 is 
different from how it loads other versions. During the load of other versions, 
when we delete an existing local file we also delete it from file mapping. But 
for v0, file manager just deletes the local dir and we missed clearing the file 
mapping in this case. Hence the old x.sst was still showing in the file mapping 
at step 4. We need to fix this and also add additional size check.
   
   
   ### Why are the changes needed?
   Can cause checkpoint corruption, hence the query will fail.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   New test included
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to