Jackie-Jiang commented on code in PR #16149:
URL: https://github.com/apache/pinot/pull/16149#discussion_r2175779492
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -232,15 +250,63 @@ public void setNotNull(boolean notNull) {
_notNull = notNull;
}
- public int getMaxLength() {
+ /**
+ * Returns the effective max length to be used.
+ * This method should be used in business logic instead of {@code
getMaxLength()},
+ * as it falls back to data type-specific or global defaults when the field
is unset.
+ */
+ @JsonIgnore
Review Comment:
Suggest adding a unit test to ensure only the expected fields are
serialized. We also want to ensure `null` fields are not serialized.
##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -1805,6 +1805,13 @@ public static class ForwardIndexConfigs {
"pinot.forward.index.default.target.docs.per.chunk";
}
+ public static class FieldSpecConfigs {
+ public static final String
CONFIG_OF_DEFAULT_JSON_MAX_LENGTH_EXCEED_STRATEGY =
+ "pinot.fieldspec.default.json.sanitization.strategy";
Review Comment:
```suggestion
"pinot.field.spec.default.json.max.length.exceed.strategy";
```
##########
pinot-server/src/main/java/org/apache/pinot/server/api/resources/TablesResource.java:
##########
@@ -130,6 +129,7 @@ public class TablesResource {
private static final Logger LOGGER =
LoggerFactory.getLogger(TablesResource.class);
private static final String PEER_SEGMENT_DOWNLOAD_DIR =
"peerSegmentDownloadDir";
private static final String SEGMENT_UPLOAD_DIR = "segmentUploadDir";
+ private static final int DEFAULT_MAX_LENGTH = 512;
Review Comment:
Suggest still reference the one from `FieldSpec`. Same for other places
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -94,6 +92,26 @@ public abstract class FieldSpec implements
Comparable<FieldSpec>, Serializable {
public static final Map DEFAULT_COMPLEX_NULL_VALUE_OF_MAP = Map.of();
public static final List DEFAULT_COMPLEX_NULL_VALUE_OF_LIST = List.of();
+ private static final int DEFAULT_MAX_LENGTH = 512;
Review Comment:
This is referenced. We probably still want to keep it public, and we can add
some javadoc explaining it can be overridden for JSON
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/DimensionFieldSpec.java:
##########
@@ -72,6 +72,6 @@ public FieldType getFieldType() {
public String toString() {
return "< field type: DIMENSION, field name: " + _name + ", data type: " +
_dataType + ", is single-value field: "
+ _singleValueField + ", default null value: " + _defaultNullValue +
", max length exceed strategy: "
- + _maxLengthExceedStrategy + " >";
+ + _maxLengthExceedStrategy + ", max length: " + _maxLength + " >";
Review Comment:
Put max length in front of exceed strategy
##########
pinot-spi/src/main/java/org/apache/pinot/spi/utils/CommonConstants.java:
##########
@@ -1805,6 +1805,13 @@ public static class ForwardIndexConfigs {
"pinot.forward.index.default.target.docs.per.chunk";
}
+ public static class FieldSpecConfigs {
+ public static final String
CONFIG_OF_DEFAULT_JSON_MAX_LENGTH_EXCEED_STRATEGY =
+ "pinot.fieldspec.default.json.sanitization.strategy";
+ public static final String CONFIG_OF_DEFAULT_JSON_MAX_LENGTH =
+ "pinot.fieldspec.default.json.max.length";
Review Comment:
```suggestion
"pinot.field.spec.default.json.max.length";
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java:
##########
@@ -53,24 +53,19 @@ public class SanitizationTransformer implements
RecordTransformer {
private final Map<String, SanitizedColumnInfo> _columnToColumnInfoMap = new
HashMap<>();
public SanitizationTransformer(Schema schema) {
- FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy;
for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
if (!fieldSpec.isVirtualColumn()) {
- if (fieldSpec.getDataType().equals(FieldSpec.DataType.STRING)) {
- maxLengthExceedStrategy =
- fieldSpec.getMaxLengthExceedStrategy() == null ?
FieldSpec.MaxLengthExceedStrategy.TRIM_LENGTH
- : fieldSpec.getMaxLengthExceedStrategy();
- _columnToColumnInfoMap.put(fieldSpec.getName(), new
SanitizedColumnInfo(fieldSpec.getName(),
- fieldSpec.getMaxLength(), maxLengthExceedStrategy,
fieldSpec.getDefaultNullValue()));
- } else if (fieldSpec.getDataType().equals(FieldSpec.DataType.JSON) ||
fieldSpec.getDataType()
- .equals(FieldSpec.DataType.BYTES)) {
- maxLengthExceedStrategy = fieldSpec.getMaxLengthExceedStrategy() ==
null
- ? FieldSpec.MaxLengthExceedStrategy.NO_ACTION :
fieldSpec.getMaxLengthExceedStrategy();
- if
(maxLengthExceedStrategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION)) {
- continue;
+ FieldSpec.DataType dataType = fieldSpec.getDataType();
+ if (dataType.equals(FieldSpec.DataType.STRING)
+ || dataType.equals(FieldSpec.DataType.JSON)
+ || dataType.equals(FieldSpec.DataType.BYTES)) {
+
+ FieldSpec.MaxLengthExceedStrategy strategy =
fieldSpec.getEffectiveMaxLengthExceedStrategy();
+ if (!strategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION) ||
dataType.equals(
+ FieldSpec.DataType.STRING)) {
Review Comment:
Do we need to check for `STRING`?
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -232,15 +250,63 @@ public void setNotNull(boolean notNull) {
_notNull = notNull;
}
- public int getMaxLength() {
+ /**
+ * Returns the effective max length to be used.
+ * This method should be used in business logic instead of {@code
getMaxLength()},
+ * as it falls back to data type-specific or global defaults when the field
is unset.
+ */
+ @JsonIgnore
+ public int getEffectiveMaxLength() {
+ // If explicitly set, return that value
+ if (_maxLength != null) {
+ return _maxLength;
+ }
+
+ // For JSON fields, use configurable default
+ if (_dataType == DataType.JSON) {
+ return getDefaultJsonMaxLength();
+ }
+
+ // For all other fields, use the standard default
+ return DEFAULT_MAX_LENGTH;
+ }
+
+ // Required by JSON de-serializer. DO NOT REMOVE.
+ // Use getEffectiveMaxLength() for default-aware access.
+ @Nullable
+ public Integer getMaxLength() {
return _maxLength;
}
// Required by JSON de-serializer. DO NOT REMOVE.
- public void setMaxLength(int maxLength) {
+ public void setMaxLength(Integer maxLength) {
Review Comment:
```suggestion
public void setMaxLength(@Nullable Integer maxLength) {
```
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -599,10 +599,14 @@ public static void
addColumnMetadataInfo(PropertiesConfiguration properties, Str
properties.setProperty(getKeyFor(column, IS_AUTO_GENERATED),
String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
if (dataType.equals(DataType.STRING) || dataType.equals(DataType.BYTES) ||
dataType.equals(DataType.JSON)) {
- properties.setProperty(getKeyFor(column, SCHEMA_MAX_LENGTH),
fieldSpec.getMaxLength());
- FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy =
fieldSpec.getMaxLengthExceedStrategy();
- if (maxLengthExceedStrategy != null) {
- properties.setProperty(getKeyFor(column,
SCHEMA_MAX_LENGTH_EXCEED_STRATEGY), maxLengthExceedStrategy);
+ Integer maxLength = fieldSpec.getMaxLength();
Review Comment:
Let's put the effective max length here. In case the JSON max length
changes, we can detect the difference. Ideally we don't want segment CRC to
change because of this PR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]