Jackie-Jiang commented on code in PR #16149:
URL: https://github.com/apache/pinot/pull/16149#discussion_r2175779492


##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -232,15 +250,63 @@ public void setNotNull(boolean notNull) {
     _notNull = notNull;
   }
 
-  public int getMaxLength() {
+  /**
+   * Returns the effective max length to be used.
+   * This method should be used in business logic instead of {@code 
getMaxLength()},
+   * as it falls back to data type-specific or global defaults when the field 
is unset.
+   */
+  @JsonIgnore

Review Comment:
   Suggest adding a unit test to ensure only the expected fields are 
serialized. We also want to ensure `null` fields are not serialized.



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -1805,6 +1805,13 @@ public static class ForwardIndexConfigs {
         "pinot.forward.index.default.target.docs.per.chunk";
   }
 
+  public static class FieldSpecConfigs {
+    public static final String 
CONFIG_OF_DEFAULT_JSON_MAX_LENGTH_EXCEED_STRATEGY =
+        "pinot.fieldspec.default.json.sanitization.strategy";

Review Comment:
   ```suggestion
           "pinot.field.spec.default.json.max.length.exceed.strategy";
   ```



##########
pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java:
##########
@@ -130,6 +129,7 @@ public class TablesResource {
   private static final Logger LOGGER = 
LoggerFactory.getLogger(TablesResource.class);
   private static final String PEER_SEGMENT_DOWNLOAD_DIR = 
"peerSegmentDownloadDir";
   private static final String SEGMENT_UPLOAD_DIR = "segmentUploadDir";
+  private static final int DEFAULT_MAX_LENGTH = 512;

Review Comment:
   Suggest still reference the one from `FieldSpec`. Same for other places



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -94,6 +92,26 @@ public abstract class FieldSpec implements 
Comparable<FieldSpec>, Serializable {
 
   public static final Map DEFAULT_COMPLEX_NULL_VALUE_OF_MAP = Map.of();
   public static final List DEFAULT_COMPLEX_NULL_VALUE_OF_LIST = List.of();
+  private static final int DEFAULT_MAX_LENGTH = 512;

Review Comment:
   This is referenced. We probably still want to keep it public, and we can add 
some javadoc explaining it can be overridden for JSON



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/DimensionFieldSpec.java:
##########
@@ -72,6 +72,6 @@ public FieldType getFieldType() {
   public String toString() {
     return "< field type: DIMENSION, field name: " + _name + ", data type: " + 
_dataType + ", is single-value field: "
         + _singleValueField + ", default null value: " + _defaultNullValue + 
", max length exceed strategy: "
-        + _maxLengthExceedStrategy + " >";
+        + _maxLengthExceedStrategy + ", max length: " + _maxLength + " >";

Review Comment:
   Put max length in front of exceed strategy



##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -1805,6 +1805,13 @@ public static class ForwardIndexConfigs {
         "pinot.forward.index.default.target.docs.per.chunk";
   }
 
+  public static class FieldSpecConfigs {
+    public static final String 
CONFIG_OF_DEFAULT_JSON_MAX_LENGTH_EXCEED_STRATEGY =
+        "pinot.fieldspec.default.json.sanitization.strategy";
+    public static final String CONFIG_OF_DEFAULT_JSON_MAX_LENGTH =
+        "pinot.fieldspec.default.json.max.length";

Review Comment:
   ```suggestion
           "pinot.field.spec.default.json.max.length";
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java:
##########
@@ -53,24 +53,19 @@ public class SanitizationTransformer implements 
RecordTransformer {
   private final Map<String, SanitizedColumnInfo> _columnToColumnInfoMap = new 
HashMap<>();
 
   public SanitizationTransformer(Schema schema) {
-    FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy;
     for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
       if (!fieldSpec.isVirtualColumn()) {
-        if (fieldSpec.getDataType().equals(FieldSpec.DataType.STRING)) {
-          maxLengthExceedStrategy =
-              fieldSpec.getMaxLengthExceedStrategy() == null ? 
FieldSpec.MaxLengthExceedStrategy.TRIM_LENGTH
-                  : fieldSpec.getMaxLengthExceedStrategy();
-          _columnToColumnInfoMap.put(fieldSpec.getName(), new 
SanitizedColumnInfo(fieldSpec.getName(),
-              fieldSpec.getMaxLength(), maxLengthExceedStrategy, 
fieldSpec.getDefaultNullValue()));
-        } else if (fieldSpec.getDataType().equals(FieldSpec.DataType.JSON) || 
fieldSpec.getDataType()
-            .equals(FieldSpec.DataType.BYTES)) {
-          maxLengthExceedStrategy = fieldSpec.getMaxLengthExceedStrategy() == 
null
-              ? FieldSpec.MaxLengthExceedStrategy.NO_ACTION : 
fieldSpec.getMaxLengthExceedStrategy();
-          if 
(maxLengthExceedStrategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION)) {
-            continue;
+        FieldSpec.DataType dataType = fieldSpec.getDataType();
+        if (dataType.equals(FieldSpec.DataType.STRING)
+            || dataType.equals(FieldSpec.DataType.JSON)
+            || dataType.equals(FieldSpec.DataType.BYTES)) {
+
+          FieldSpec.MaxLengthExceedStrategy strategy = 
fieldSpec.getEffectiveMaxLengthExceedStrategy();
+          if (!strategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION) || 
dataType.equals(
+              FieldSpec.DataType.STRING)) {

Review Comment:
   Do we need to check for `STRING`?



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -232,15 +250,63 @@ public void setNotNull(boolean notNull) {
     _notNull = notNull;
   }
 
-  public int getMaxLength() {
+  /**
+   * Returns the effective max length to be used.
+   * This method should be used in business logic instead of {@code 
getMaxLength()},
+   * as it falls back to data type-specific or global defaults when the field 
is unset.
+   */
+  @JsonIgnore
+  public int getEffectiveMaxLength() {
+    // If explicitly set, return that value
+    if (_maxLength != null) {
+      return _maxLength;
+    }
+
+    // For JSON fields, use configurable default
+    if (_dataType == DataType.JSON) {
+      return getDefaultJsonMaxLength();
+    }
+
+    // For all other fields, use the standard default
+    return DEFAULT_MAX_LENGTH;
+  }
+
+  // Required by JSON de-serializer. DO NOT REMOVE.
+  // Use getEffectiveMaxLength() for default-aware access.
+  @Nullable
+  public Integer getMaxLength() {
     return _maxLength;
   }
 
   // Required by JSON de-serializer. DO NOT REMOVE.
-  public void setMaxLength(int maxLength) {
+  public void setMaxLength(Integer maxLength) {

Review Comment:
   ```suggestion
     public void setMaxLength(@Nullable Integer maxLength) {
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -599,10 +599,14 @@ public static void 
addColumnMetadataInfo(PropertiesConfiguration properties, Str
     properties.setProperty(getKeyFor(column, IS_AUTO_GENERATED),
         String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
     if (dataType.equals(DataType.STRING) || dataType.equals(DataType.BYTES) || 
dataType.equals(DataType.JSON)) {
-      properties.setProperty(getKeyFor(column, SCHEMA_MAX_LENGTH), 
fieldSpec.getMaxLength());
-      FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy = 
fieldSpec.getMaxLengthExceedStrategy();
-      if (maxLengthExceedStrategy != null) {
-        properties.setProperty(getKeyFor(column, 
SCHEMA_MAX_LENGTH_EXCEED_STRATEGY), maxLengthExceedStrategy);
+      Integer maxLength = fieldSpec.getMaxLength();

Review Comment:
   Let's put the effective max length here. In case the JSON max length 
changes, we can detect the difference. Ideally we don't want segment CRC to 
change because of this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to