klsince commented on code in PR #12945:
URL: https://github.com/apache/pinot/pull/12945#discussion_r1591624294
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java:
##########
@@ -70,23 +68,32 @@ public SingleValueVarByteRawIndexCreator(File baseIndexDir,
ChunkCompressionType
* @param maxLength length of longest entry (in bytes)
* @param deriveNumDocsPerChunk true if writer should auto-derive the number
of rows per chunk
* @param writerVersion writer format version
+ * @param targetMaxChunkSizeBytes target max chunk size in bytes, applicable
only for V4 or when
+ * deriveNumDocsPerChunk is true
+ * @param targetDocsPerChunk target number of docs per chunk
* @throws IOException
*/
public SingleValueVarByteRawIndexCreator(File baseIndexDir,
ChunkCompressionType compressionType, String column,
- int totalDocs, DataType valueType, int maxLength, boolean
deriveNumDocsPerChunk, int writerVersion)
+ int totalDocs, DataType valueType, int maxLength, boolean
deriveNumDocsPerChunk, int writerVersion,
+ int targetMaxChunkSizeBytes, int targetDocsPerChunk)
throws IOException {
File file = new File(baseIndexDir, column +
V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION);
- int numDocsPerChunk = deriveNumDocsPerChunk ?
getNumDocsPerChunk(maxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+ int numDocsPerChunk =
+ deriveNumDocsPerChunk ? getNumDocsPerChunk(maxLength,
targetMaxChunkSizeBytes) : targetDocsPerChunk;
+
+ // For columns with very small max value, target chunk size should also be
capped to reduce memory during read
+ int dynamicTargetChunkSize =
+ ForwardIndexUtils.getDynamicTargetChunkSize(maxLength,
targetDocsPerChunk, targetMaxChunkSizeBytes);
Review Comment:
should this method take numDocsPerChunk instead of targetDocsPerChunk here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]