github-actions[bot] commented on code in PR #62722:
URL: https://github.com/apache/doris/pull/62722#discussion_r3308313012
##########
be/src/storage/index/inverted/inverted_index_fs_directory.h:
##########
@@ -161,14 +161,29 @@ class CLUCENE_EXPORT DorisRAMFSDirectory : public
DorisFSDirectory {
const char* getObjectName() const override;
};
+// Thread-safety contract (see AIR-12 design doc ยง3-A):
+// - FSIndexInput instances are NOT thread-safe. Use clone() to produce
+// per-thread copies.
+// - Concurrent read() + clone() on the SAME instance is undefined.
+// - Different clones sharing the same SharedHandle are safe WHEN the
+// underlying `io::FileReader::read_at` is stateless (pread-like).
+// The USE_HADOOP_HDFS build path (libhadoop / `hdfsPread`), as well as
Review Comment:
This new contract is too broad for the shared `_reader` that clones now use
concurrently. `S3FileReader::read_at_impl()` is not stateless: it updates
`_s3_stats` (`SCOPED_RAW_TIMER(&_s3_stats.total_get_request_time_ns)`,
`_s3_stats.total_get_request_counter++`, error counters, etc.) without
synchronization, and `CachedRemoteFileReader` can dispatch concurrent cache
misses to the same `_remote_file_reader`. `HdfsFileReader::read_at_impl()` also
resets `_handle` and destroys the accessor on any non-OK status, so in the
`USE_HADOOP_HDFS` build one clone can close/destroy the shared HDFS handle
while another clone is inside `hdfsPread()`. Before this PR the per-handle lock
serialized all of those calls; after this change cached searchers can run
sibling clones in parallel and hit data races/UAF on remote storage. Please
keep serialization around `_reader->read_at` for these backends, or first
make/prove every shared FileReader implementation thread-safe (including
mutable stats and er
ror/close lifecycle), and add remote/TSAN coverage for that path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]