github-actions[bot] commented on code in PR #64578:
URL: https://github.com/apache/doris/pull/64578#discussion_r3427467086


##########
be/src/io/cache/block_file_cache.cpp:
##########
@@ -734,7 +743,11 @@ std::string BlockFileCache::clear_file_cache_async() {
             }
         }
 
-        // we cannot delete the element in the loop above, because it will 
break the iterator
+        // Do not erase while walking _files above: remove() may erase the 
current map element.
+        //
+        // sync_remove only changes how already releasable DOWNLOADED blocks 
are deleted from
+        // storage. Busy blocks keep the existing holder lifecycle: mark them 
deleting and leave

Review Comment:
   Leaving busy blocks in `_files` makes them visible to later lookups. 
`clear_file_cache_sync()` can mark a block deleting because an existing 
`FileBlocksHolder` still references it, return to the HTTP caller, and then a 
new read of the same hash/range will go through `get_impl()`/`use_cell()` and 
receive that same `FileBlockSPtr` because those paths do not check 
`is_deleting()`. That means sync clear can still serve the old cached data, and 
each later holder can extend the lifetime of the block that was supposed to be 
cleared. The old direct sync clear detached all entries from `_files` before 
returning, so please either detach deleting blocks from the lookup/LRU 
structures while deferring physical removal to holder cleanup, or make all 
lookup/list paths skip `is_deleting()` cells.



##########
be/src/io/cache/block_file_cache.cpp:
##########
@@ -746,15 +759,15 @@ std::string BlockFileCache::clear_file_cache_async() {
             FileBlockSPtr file_block = cell->file_block;
             if (file_block) {
                 std::lock_guard block_lock(file_block->_mutex);

Review Comment:
   This new sync path calls `remove()` for each block, but `remove()` only logs 
`_storage->remove(key)` failures and then still erases the in-memory cell. If a 
disk delete or meta-store cleanup returns an error, 
`FileCacheFactory::clear_file_caches(sync, &clear_msg)` still returns 
`Status::OK`, so the HTTP response says `"status":"OK"` even though the cache 
file/meta can remain on disk and may be reloaded later. The previous direct 
path at least checked `_storage->clear()` before clearing memory. Please 
propagate storage-remove failures for sync clear, or keep/queue failed blocks 
for retry and return a non-OK status to the caller.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to