aho135 commented on code in PR #19571:
URL: https://github.com/apache/druid/pull/19571#discussion_r3384832372


##########
docs/ingestion/kafka-ingestion.md:
##########
@@ -263,6 +264,46 @@ The following example shows a supervisor spec with idle 
configuration enabled:
 ```
 </details>
 
+#### Partition filter dimensions
+
+When you set `partitionFilterDimensions` in the IO config, the supervisor 
tracks the distinct values observed for each listed dimension during ingestion. 
At segment publish time, each segment is annotated with only the values it 
actually ingested. The broker then uses these annotations to skip segments at 
query time when the query filter doesn't intersect the segment's declared 
values.
+
+This enables segment pruning for streaming-ingested data without waiting for 
compaction to produce hash or range-partitioned segments.
+
+**Usage guidelines:**
+
+- Use only low-to-medium cardinality dimensions (for example, `tenant_id`, 
`region`, `environment`). High-cardinality dimensions bloat segment metadata 
with no pruning benefit.
+- Most effective when Kafka partitions are keyed by the tracked dimension (for 
example, using tenant ID as the message key). Each task naturally sees a subset 
of values, and segments get tight filter annotations.
+- Also works with multiple supervisors reading from separate topics into one 
datasource.
+- After compaction, the `StreamRangeShardSpec` annotations are replaced by the 
compaction output's shard spec (hash or range partitioning), which provides its 
own pruning.

Review Comment:
   Maybe worth mentioning that when using `partitionFilterDimensions`, dynamic 
compaction strategy should not be used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to