Re: [PR] Support to create combined Lucene Text Index Files and merge into columns.psf [pinot]

via GitHub Wed, 03 Sep 2025 03:10:13 -0700


Copilot commented on code in PR #16688:
URL: https://github.com/apache/pinot/pull/16688#discussion_r2318476829



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/text/TextIndexType.java:
##########
@@ -149,18 +149,23 @@ private ReaderFactory() {
     @Override
     public TextIndexReader createIndexReader(SegmentDirectory.Reader 
segmentReader, FieldIndexConfigs fieldIndexConfigs,
         ColumnMetadata metadata)
-        throws IndexReaderConstraintException {
+        throws IndexReaderConstraintException, IOException {
       if (metadata.getDataType() != FieldSpec.DataType.STRING) {
         throw new IndexReaderConstraintException(metadata.getColumnName(), 
StandardIndexes.text(),
             "Text index is currently only supported on STRING type columns");
       }
       File segmentDir = segmentReader.toSegmentDirectory().getPath().toFile();
+      TextIndexConfig indexConfig = 
fieldIndexConfigs.getConfig(StandardIndexes.text());
+      if (indexConfig.isUseCombineFiles()) {
+        PinotDataBuffer textIndexBuffer = 
segmentReader.getIndexFor(metadata.getColumnName(), StandardIndexes.text());
+        return new LuceneTextIndexReader(metadata.getColumnName(), 
textIndexBuffer, metadata.getTotalDocs(),
+            indexConfig);
+      }
       FSTType textIndexFSTType = TextIndexUtils.getFSTTypeOfIndex(segmentDir, 
metadata.getColumnName());
       if (textIndexFSTType == FSTType.NATIVE) {
         // TODO: Support loading native text index from a PinotDataBuffer
         return new NativeTextIndexReader(metadata.getColumnName(), segmentDir);
       }
-      TextIndexConfig indexConfig = 
fieldIndexConfigs.getConfig(StandardIndexes.text());
       return new LuceneTextIndexReader(metadata.getColumnName(), segmentDir, 
metadata.getTotalDocs(), indexConfig);

Review Comment:
   Adding IOException to the method signature is a breaking change to the 
IndexReaderFactory interface. This could break existing implementations and 
callers that don't expect IOException to be thrown. Consider using a runtime 
exception wrapper or handling the IOException internally.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/text/LuceneTextIndexReader.java:
##########
@@ -73,7 +77,7 @@ public class LuceneTextIndexReader implements TextIndexReader 
{
   private final Analyzer _analyzer;
   private boolean _useANDForMultiTermQueries = false;
   private final String _queryParserClass;
-  private Constructor<QueryParserBase> _queryParserClassConstructor;
+  private final Constructor<QueryParserBase> _queryParserClassConstructor;

Review Comment:
   The field is declared as final but was previously non-final. This change 
could break existing code that might have been modifying this field after 
initialization. Ensure this change is intentional and doesn't break existing 
functionality.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/invertedindex/TextIndexHandler.java:
##########
@@ -78,11 +83,20 @@ public TextIndexHandler(SegmentDirectory segmentDirectory, 
Map<String, FieldInde
   }
 
   @Override
-  public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader) {
+  public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader)
+      throws Exception {

Review Comment:
   Adding 'throws Exception' to the method signature is a breaking change to 
the BaseIndexHandler interface. This overly broad exception declaration 
violates best practices and could break existing implementations. Consider 
throwing more specific exceptions or handling exceptions internally.
   ```suggestion
     public boolean needUpdateIndices(SegmentDirectory.Reader segmentReader) {
   ```



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/TextIndexConfig.java:
##########
@@ -112,7 +114,32 @@ public TextIndexConfig(Boolean disabled, FSTType fstType, 
Object rawValueForText
         luceneAnalyzerClassArgs, luceneAnalyzerClassArgTypes, 
luceneQueryParserClass,
         enablePrefixSuffixMatchingInPhraseQueries, reuseMutableIndex,
         luceneNRTCachingDirectoryMaxBufferSizeMB, useLogByteSizeMergePolicy, 
docIdTranslatorMode,
-        LUCENE_INDEX_DEFAULT_CASE_SENSITIVE_INDEX);
+        LUCENE_INDEX_DEFAULT_CASE_SENSITIVE_INDEX, 
LUCENE_INDEX_DEFAULT_USE_COMBINE_FILES);
+  }
+
+  /**
+   * @deprecated Use the new constructor with useCombineFiles parameter 
instead.
+   * This constructor will be removed in a future version.
+   */
+  @Deprecated
+  public TextIndexConfig(Boolean luceneUseCompoundFile, FSTType fstType, 
Object rawValue,
+                        boolean noRawData, boolean enableQueryCache, 
List<String> stopWordsInclude,
+                        List<String> stopWordsExclude, Boolean 
useAndForMultiTermQueries,
+                        Integer maxResultCacheSize, String 
stopWordsIncludeKey, String stopWordsExcludeKey,
+                        String useAndForMultiTermQueriesKey, String 
maxResultCacheSizeKey,
+                        Boolean enablePrefixSuffixMatching, Boolean 
enablePrefixSuffixPhraseMatching,
+                        Integer maxResultCacheSizeKeyInt, Boolean 
useCombineFiles,
+                        DocIdTranslatorMode docIdTranslatorMode, Boolean 
enablePrefixSuffixMatchingKey) {
+    // Call the new constructor with default useCombineFiles value
+    this(useCombineFiles, fstType, rawValue, enableQueryCache,
+         useAndForMultiTermQueries != null ? useAndForMultiTermQueries : false,
+         stopWordsInclude != null ? stopWordsInclude : new ArrayList<>(),
+         stopWordsExclude != null ? stopWordsExclude : new ArrayList<>(),
+         luceneUseCompoundFile, maxResultCacheSize, null, null, null, null,
+         enablePrefixSuffixPhraseMatching != null ? 
enablePrefixSuffixPhraseMatching : false,
+         false, 0, false, docIdTranslatorMode,
+         enablePrefixSuffixMatchingKey != null ? enablePrefixSuffixMatchingKey 
: false,
+         useCombineFiles != null ? useCombineFiles : false);
   }

Review Comment:
   The deprecated constructor has confusing parameter mapping and incorrect 
parameter ordering. The constructor appears to mix parameter names and actual 
values, making it difficult to maintain. The parameter `useCombineFiles` 
appears twice with different positions and the mapping logic is unclear. 
Consider removing this deprecated constructor or fixing the parameter mapping 
to be more explicit.
   ```suggestion
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Support to create combined Lucene Text Index Files and merge into columns.psf [pinot]

Reply via email to