[GitHub] [pinot] gortiz commented on a diff in pull request #10687: Implement mutable index using index SPI

via GitHub Tue, 25 Apr 2023 08:03:49 -0700


gortiz commented on code in PR #10687:
URL: https://github.com/apache/pinot/pull/10687#discussion_r1176657786



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##########
@@ -287,111 +281,67 @@ public boolean isMutableSegment() {
         partitions.add(config.getPartitionId());
       }
 
+      // TODO (mutable-index-spi): The comment above was here, but no check 
was done.
+      //  It seems the code that apply that check was removed around 2020. 
Should we remove the comment?
       // Check whether to generate raw index for the column while consuming
       // Only support generating raw index on single-value columns that do not 
have inverted index while
       // consuming. After consumption completes and the segment is built, all 
single-value columns can have raw index
-      DataType storedType = fieldSpec.getDataType().getStoredType();
-      boolean isFixedWidthColumn = storedType.isFixedWidth();
-      MutableIndexProvider indexProvider = 
IndexingOverrides.getMutableIndexProvider();
-      MutableForwardIndex forwardIndex = 
indexProvider.newForwardIndex(context.forForwardIndex(avgNumMultiValues));
 
       // Dictionary-encoded column
-      MutableDictionary dictionary = null;
+      MutableDictionary dictionary;
       if (isDictionary) {
-        int dictionaryColumnSize =
-            isFixedWidthColumn ? storedType.size() : 
_statsHistory.getEstimatedAvgColSize(column);
-        // NOTE: preserve 10% buffer for cardinality to reduce the chance of 
re-sizing the dictionary
-        int estimatedCardinality = (int) 
(_statsHistory.getEstimatedCardinality(column) * 1.1);
-        dictionary = 
indexProvider.newDictionary(context.forDictionary(dictionaryColumnSize, 
estimatedCardinality));
-        // Even though the column is defined as 'no-dictionary' in the config, 
we did create dictionary for consuming
-        // segment.
-        noDictionaryColumns.remove(column);
-      }
-
-      // Inverted index
-      MutableInvertedIndex invertedIndexReader =
-          invertedIndexColumns.contains(column) ? 
indexProvider.newInvertedIndex(context.forInvertedIndex()) : null;
-
-      MutableTextIndex fstIndex = null;
-      // FST Index
-      if (_fieldConfigList != null && fstIndexColumns.contains(column)) {
-        for (FieldConfig fieldConfig : _fieldConfigList) {
-          if (fieldConfig.getName().equals(column)) {
-            Map<String, String> properties = fieldConfig.getProperties();
-            if (TextIndexUtils.isFstTypeNative(properties)) {
-              fstIndex = new NativeMutableFSTIndex(column);
-            }
-          }
+        DictionaryIndexConfig dictionaryIndexConfig = 
indexConfigs.getConfig(StandardIndexes.dictionary());
+        if (dictionaryIndexConfig.isDisabled()) {
+          dictionaryIndexConfig = DictionaryIndexConfig.DEFAULT;
         }
-      }
-
-      // Text index
-      MutableTextIndex textIndex;
-      List<String> stopWordsInclude = null;
-      List<String> stopWordsExclude = null;
-      if (textIndexColumns.contains(column)) {
-        boolean useNativeTextIndex = false;
-        if (_fieldConfigList != null) {
-          for (FieldConfig fieldConfig : _fieldConfigList) {
-            if (fieldConfig.getName().equals(column)) {
-              Map<String, String> properties = fieldConfig.getProperties();
-              stopWordsInclude = 
TextIndexUtils.extractStopWordsInclude(properties);
-              stopWordsExclude = 
TextIndexUtils.extractStopWordsExclude(properties);
-              if (TextIndexUtils.isFstTypeNative(properties)) {
-                useNativeTextIndex = true;
-              }
-            }
+        dictionary = DictionaryIndexType.createMutableDictionary(context, 
dictionaryIndexConfig);
+      } else {
+        dictionary = null;
+        if (!fieldSpec.isSingleValueField()) {
+          // Raw MV columns
+          DataType dataType = fieldSpec.getDataType().getStoredType();
+          switch (dataType) {
+            case INT:
+            case LONG:
+            case FLOAT:
+            case DOUBLE:
+              break;
+            default:
+              throw new UnsupportedOperationException(
+                  "Unsupported data type: " + dataType + " for MV 
no-dictionary column: " + column);
           }
         }
+      }
 
-        if (useNativeTextIndex) {
-          textIndex = new NativeMutableTextIndex(column);
-        } else {
+      // Null value vector
+      MutableNullValueVector nullValueVector = _nullHandlingEnabled ? new 
MutableNullValueVector() : null;
 
-          // TODO - this logic is in the wrong place and belongs in a 
Lucene-specific submodule,
-          //  it is beyond the scope of realtime index pluggability to do this 
refactoring, so realtime
-          //  text indexes remain statically defined. Revisit this after this 
refactoring has been done.
-          RealtimeLuceneTextIndex luceneTextIndex =
-              new RealtimeLuceneTextIndex(column, new 
File(config.getConsumerDir()), _segmentName, stopWordsInclude,
-                  stopWordsExclude);
-          if (_realtimeLuceneReaders == null) {
-            _realtimeLuceneReaders = new 
RealtimeLuceneIndexRefreshState.RealtimeLuceneReaders(_segmentName);
-          }
-          _realtimeLuceneReaders.addReader(luceneTextIndex);
-          textIndex = luceneTextIndex;
+      Map<IndexType, MutableIndex> mutableIndexes = new HashMap<>();
+      for (IndexType<?, ?, ?> indexType : 
IndexService.getInstance().getAllIndexes()) {
+        if (!specialIndexes.contains(indexType)) {
+          addMutableIndex(mutableIndexes, indexType, context, indexConfigs);
         }
-      } else {
-        textIndex = null;
       }
 
-      // Json index
-      JsonIndexConfig jsonIndexConfig = jsonIndexConfigs.get(column);
-      MutableJsonIndex jsonIndex =
-          jsonIndexConfig != null ? 
indexProvider.newJsonIndex(context.forJsonIndex(jsonIndexConfig)) : null;
-
-      // H3 index
-      // TODO consider making this overridable
-      MutableH3Index h3Index;
-      try {
-        H3IndexConfig h3IndexConfig = h3IndexConfigs.get(column);
-        h3Index = h3IndexConfig != null ? new 
MutableH3Index(h3IndexConfig.getResolution()) : null;
-      } catch (IOException e) {
-        throw new RuntimeException(String.format("Failed to initiate H3 index 
for column: %s", column), e);
+      // TODO - this logic is in the wrong place and belongs in a 
Lucene-specific submodule,
+      //  it is beyond the scope of realtime index pluggability to do this 
refactoring, so realtime
+      //  text indexes remain statically defined. Revisit this after this 
refactoring has been done.
+      MutableIndex textIndex = mutableIndexes.get(StandardIndexes.text());
+      if (textIndex instanceof RealtimeLuceneTextIndex) {
+        if (_realtimeLuceneReaders == null) {
+          _realtimeLuceneReaders = new 
RealtimeLuceneIndexRefreshState.RealtimeLuceneReaders(_segmentName);
+        }
+        _realtimeLuceneReaders.addReader((RealtimeLuceneTextIndex) textIndex);

Review Comment:
   This is the poor man's way to add the index to lucene readers now. The TODO 
was originally created by Richard (although it has been moved and now git 
attributes o me) and still applies. This logic doesn't belong here, but this 
should be treated in a better way once we decide to invest resources in 
reducing this hack



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] gortiz commented on a diff in pull request #10687: Implement mutable index using index SPI

Reply via email to