yashmayya commented on code in PR #15817:
URL: https://github.com/apache/pinot/pull/15817#discussion_r2120150368
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/SegmentUtils.java:
##########
@@ -33,38 +34,42 @@ public class SegmentUtils {
private SegmentUtils() {
}
- // Returns the partition id of a realtime segment based segment name and
segment metadata info retrieved via Helix.
- // Important: The method is costly because it may read data from zookeeper.
Do not use it in any query execution
- // path.
+ /// Returns the partition id of a segment based on segment name or ZK
metadata.
+ /// Can return `null` if the partition id cannot be determined.
+ /// Important: The method is costly because it may read data from zookeeper.
Do not use it in query execution path.
@Nullable
- public static Integer getRealtimeSegmentPartitionId(String segmentName,
String realtimeTableName,
- HelixManager helixManager, @Nullable String partitionColumn) {
- Integer partitionId = getPartitionIdFromRealtimeSegmentName(segmentName);
+ public static Integer getSegmentPartitionId(String segmentName, String
tableNameWithType, HelixManager helixManager,
Review Comment:
This util method is now being used in certain contexts for `OFFLINE` tables
as well so we'll first try to get the partition ID from the segment name using
the realtime table logic (LLC or uploaded segment). Is that intentional in
order to consolidate the logic? What if the offline segment name happens to
have 4 or 5 parts separated by `__`? Looks like
`UploadedRealtimeSegmentName.of` handles parse exceptions (expected number of
parts, but token in `partitionId` position not an `int` for instance) and
returns `null`, but `LLCSegmentName.of` does not.
##########
pinot-common/src/main/java/org/apache/pinot/common/utils/SegmentUtils.java:
##########
@@ -76,25 +81,61 @@ public static Integer
getPartitionIdFromRealtimeSegmentName(String segmentName)
return null;
}
+ /// Returns the partition id of a segment based on segment ZK metadata.
+ /// Can return `null` if the partition id cannot be determined.
@Nullable
- private static Integer getRealtimeSegmentPartitionId(SegmentZKMetadata
segmentZKMetadata,
+ private static Integer getPartitionIdFromSegmentZKMetadata(SegmentZKMetadata
segmentZKMetadata,
@Nullable String partitionColumn) {
SegmentPartitionMetadata segmentPartitionMetadata =
segmentZKMetadata.getPartitionMetadata();
- if (segmentPartitionMetadata != null) {
- Map<String, ColumnPartitionMetadata> columnPartitionMap =
segmentPartitionMetadata.getColumnPartitionMap();
- ColumnPartitionMetadata columnPartitionMetadata = null;
- if (partitionColumn != null) {
- columnPartitionMetadata = columnPartitionMap.get(partitionColumn);
- } else {
- if (columnPartitionMap.size() == 1) {
- columnPartitionMetadata =
columnPartitionMap.values().iterator().next();
- }
- }
- if (columnPartitionMetadata != null &&
columnPartitionMetadata.getPartitions().size() == 1) {
- return columnPartitionMetadata.getPartitions().iterator().next();
+ return segmentPartitionMetadata != null ?
getPartitionIdFromSegmentPartitionMetadata(segmentPartitionMetadata,
+ partitionColumn) : null;
+ }
+
+ /// Returns the partition id of a segment based on
[SegmentPartitionMetadata].
+ /// Can return `null` if the partition id cannot be determined.
+ @VisibleForTesting
+ @Nullable
+ static Integer
getPartitionIdFromSegmentPartitionMetadata(SegmentPartitionMetadata
segmentPartitionMetadata,
+ @Nullable String partitionColumn) {
+ Map<String, ColumnPartitionMetadata> columnPartitionMap =
segmentPartitionMetadata.getColumnPartitionMap();
+ ColumnPartitionMetadata columnPartitionMetadata = null;
+ if (partitionColumn != null) {
+ columnPartitionMetadata = columnPartitionMap.get(partitionColumn);
+ } else {
+ if (columnPartitionMap.size() == 1) {
+ columnPartitionMetadata =
columnPartitionMap.values().iterator().next();
}
}
- return null;
+ if (columnPartitionMetadata != null &&
columnPartitionMetadata.getPartitions().size() == 1) {
+ return columnPartitionMetadata.getPartitions().iterator().next();
+ } else {
+ return null;
+ }
+ }
+
+ /// Returns the partition id of a segment based on segment name or ZK
metadata, or a default partition id based on the
+ /// hash of the segment name.
+ /// Important: The method is costly because it may read data from zookeeper.
Do not use it in query execution path.
+ public static int getSegmentPartitionIdOrDefault(String segmentName, String
tableNameWithType,
+ HelixManager helixManager, @Nullable String partitionColumn) {
+ Integer partitionId = getSegmentPartitionId(segmentName,
tableNameWithType, helixManager, partitionColumn);
+ return partitionId != null ? partitionId :
getDefaultPartitionId(segmentName);
+ }
+
+ /// Returns the partition id of a segment based on segment name or ZK
metadata, or a default partition id based on the
+ /// hash of the segment name.
+ public static int getSegmentPartitionIdOrDefault(SegmentZKMetadata
segmentZKMetadata,
Review Comment:
Unused?
##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -542,9 +542,19 @@ private RebalanceResult doRebalance(TableConfig
tableConfig, RebalanceConfig reb
if (segmentsToMoveChanged) {
try {
// Re-calculate the instance partitions in case the instance
configs changed during the rebalance
- instancePartitionsMap =
- getInstancePartitionsMap(tableConfig, reassignInstances,
bootstrap, false,
- minimizeDataMovement, tableRebalanceLogger).getLeft();
+ Pair<Map<InstancePartitionsType, InstancePartitions>, Boolean>
instancePartitionsMapAndUnchanged =
+ getInstancePartitionsMap(tableConfig, reassignInstances,
bootstrap, false, minimizeDataMovement,
+ tableRebalanceLogger);
+ instancePartitionsMap =
instancePartitionsMapAndUnchanged.getLeft();
+ instancePartitionsUnchanged =
instancePartitionsMapAndUnchanged.getRight();
+ // If the instance partitions have changed, clear the
segmentPartitionIdMap as the number of partitions
+ // may have changed, resulting in a different partitionId
calculation. This change will only make a
+ // difference for the scenario when it was changed from or to 1
partition. The numPartitions is not used
+ // otherwise.
+ if (!instancePartitionsUnchanged) {
+ LOGGER.info("Clear the cached segmentPartitionIdMap as the
instance partitions has changed");
+ segmentPartitionIdMap.clear();
+ }
Review Comment:
I don't follow this assertion:
> This change will only make a difference for the scenario when it was
changed from or to 1 partition. The numPartitions is not used otherwise.
Isn't `numPartitions` still used to calculate the actual partition ID to
assign a segment to within a replica group (using the segment's partition ID)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]