siddharthteotia commented on code in PR #11558:
URL: https://github.com/apache/pinot/pull/11558#discussion_r1325020854


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/realtime/impl/invertedindex/RealtimeLuceneTextIndex.java:
##########
@@ -109,27 +112,43 @@ public ImmutableRoaringBitmap getDictIds(String 
searchQuery) {
   @Override
   public MutableRoaringBitmap getDocIds(String searchQuery) {
     MutableRoaringBitmap docIDs = new MutableRoaringBitmap();
-    Collector docIDCollector = new RealtimeLuceneDocIdCollector(docIDs);
-    IndexSearcher indexSearcher = null;
-    try {
-      Query query = new QueryParser(_column, _analyzer).parse(searchQuery);
-      indexSearcher = _searcherManager.acquire();
-      indexSearcher.search(query, docIDCollector);
-      return getPinotDocIds(indexSearcher, docIDs);
-    } catch (Exception e) {
-      LOGGER
-          .error("Failed while searching the realtime text index for column 
{}, search query {}, exception {}", _column,
-              searchQuery, e.getMessage());
-      throw new RuntimeException(e);
-    } finally {
+    RealtimeLuceneDocIdCollector docIDCollector = new 
RealtimeLuceneDocIdCollector(docIDs);
+    // A thread interrupt during indexSearcher.search() can break the 
underlying FSDirectory used by the IndexWriter
+    // which the SearcherManager is created with. To ensure the index is never 
corrupted the search is executed
+    // in a child thread and the interrupt is handled in the current thread by 
canceling the search gracefully.
+    // See https://github.com/apache/lucene/issues/3315 and 
https://github.com/apache/lucene/issues/9309
+    Callable<MutableRoaringBitmap> searchCallable = () -> {
+      IndexSearcher indexSearcher = null;
       try {
-        if (indexSearcher != null) {
-          _searcherManager.release(indexSearcher);
+        Query query = new QueryParser(_column, _analyzer).parse(searchQuery);
+        indexSearcher = _searcherManager.acquire();
+        indexSearcher.search(query, docIDCollector);
+        return getPinotDocIds(indexSearcher, docIDs);
+      } finally {
+        try {
+          if (indexSearcher != null) {
+            _searcherManager.release(indexSearcher);
+          }
+        } catch (Exception e) {
+          LOGGER.error(
+              "Failed while releasing the searcher manager for realtime text 
index for column {}, exception {}",
+              _column, e.getMessage());
         }
-      } catch (Exception e) {
-        LOGGER.error("Failed while releasing the searcher manager for realtime 
text index for column {}, exception {}",
-            _column, e.getMessage());
       }
+    };
+    Future<MutableRoaringBitmap> searchFuture =
+        SEARCHER_POOL.getExecutorService().submit(searchCallable);
+    try {
+      return searchFuture.get();
+    } catch (InterruptedException e) {
+      docIDCollector.markShouldCancel();
+      LOGGER.warn("Lucene query timed out while searching the realtime text 
index for segment {}, column {},"
+              + " search query {}", _segmentName, _column, searchQuery);
+      throw new RuntimeException("Lucene query was cancelled after timeout was 
reached");

Review Comment:
   However, I do strongly suggest to change the error message though - `Lucene 
query was cancelled after timeout was reached`
   
   Not a good idea to leak this implementation detail of Lucene to the user. 
Suggest making it generic / SQL specific. Something along the lines of 
`"TEXT_MATCH query timedout / cancelled"` or `"TEXT_MATCH query timeout / 
cancelled on realtime consuming segment"`
   
   Similarly for logged warning



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to