github-actions[bot] commented on code in PR #62722:
URL: https://github.com/apache/doris/pull/62722#discussion_r3308313012


##########
be/src/storage/index/inverted/inverted_index_fs_directory.h:
##########
@@ -161,14 +161,29 @@ class CLUCENE_EXPORT DorisRAMFSDirectory : public 
DorisFSDirectory {
     const char* getObjectName() const override;
 };
 
+// Thread-safety contract (see AIR-12 design doc ยง3-A):
+//   - FSIndexInput instances are NOT thread-safe. Use clone() to produce
+//     per-thread copies.
+//   - Concurrent read() + clone() on the SAME instance is undefined.
+//   - Different clones sharing the same SharedHandle are safe WHEN the
+//     underlying `io::FileReader::read_at` is stateless (pread-like).
+//     The USE_HADOOP_HDFS build path (libhadoop / `hdfsPread`), as well as

Review Comment:
   This new contract is too broad for the shared `_reader` that clones now use 
concurrently. `S3FileReader::read_at_impl()` is not stateless: it updates 
`_s3_stats` (`SCOPED_RAW_TIMER(&_s3_stats.total_get_request_time_ns)`, 
`_s3_stats.total_get_request_counter++`, error counters, etc.) without 
synchronization, and `CachedRemoteFileReader` can dispatch concurrent cache 
misses to the same `_remote_file_reader`. `HdfsFileReader::read_at_impl()` also 
resets `_handle` and destroys the accessor on any non-OK status, so in the 
`USE_HADOOP_HDFS` build one clone can close/destroy the shared HDFS handle 
while another clone is inside `hdfsPread()`. Before this PR the per-handle lock 
serialized all of those calls; after this change cached searchers can run 
sibling clones in parallel and hit data races/UAF on remote storage. Please 
keep serialization around `_reader->read_at` for these backends, or first 
make/prove every shared FileReader implementation thread-safe (including 
mutable stats and er
 ror/close lifecycle), and add remote/TSAN coverage for that path.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to