rhodo commented on code in PR #16149:
URL: https://github.com/apache/pinot/pull/16149#discussion_r2175888171
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -599,10 +599,14 @@ public static void
addColumnMetadataInfo(PropertiesConfiguration properties, Str
properties.setProperty(getKeyFor(column, IS_AUTO_GENERATED),
String.valueOf(columnIndexCreationInfo.isAutoGenerated()));
if (dataType.equals(DataType.STRING) || dataType.equals(DataType.BYTES) ||
dataType.equals(DataType.JSON)) {
- properties.setProperty(getKeyFor(column, SCHEMA_MAX_LENGTH),
fieldSpec.getMaxLength());
- FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy =
fieldSpec.getMaxLengthExceedStrategy();
- if (maxLengthExceedStrategy != null) {
- properties.setProperty(getKeyFor(column,
SCHEMA_MAX_LENGTH_EXCEED_STRATEGY), maxLengthExceedStrategy);
+ Integer maxLength = fieldSpec.getMaxLength();
Review Comment:
Thanks for the suggestion!
A couple of follow-ups:
- Should we apply the same idea to SCHEMA_MAX_LENGTH_EXCEED_STRATEGY as well?
- In the equals() method
([link](https://github.com/apache/pinot/pull/16149/files#diff-05c7d0209f5495c4bb9b5f93d37d8bf5a706114aa0e9afa11c367000651fd395L500-L502)),
should we compare the effective max length or just the configured max length?
We have a number of unit tests that compare a field spec to one reconstructed
from segment metadata, so the decision here may decide wether we should update
those tests depending on which semantics we prefer.
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/SanitizationTransformer.java:
##########
@@ -53,24 +53,19 @@ public class SanitizationTransformer implements
RecordTransformer {
private final Map<String, SanitizedColumnInfo> _columnToColumnInfoMap = new
HashMap<>();
public SanitizationTransformer(Schema schema) {
- FieldSpec.MaxLengthExceedStrategy maxLengthExceedStrategy;
for (FieldSpec fieldSpec : schema.getAllFieldSpecs()) {
if (!fieldSpec.isVirtualColumn()) {
- if (fieldSpec.getDataType().equals(FieldSpec.DataType.STRING)) {
- maxLengthExceedStrategy =
- fieldSpec.getMaxLengthExceedStrategy() == null ?
FieldSpec.MaxLengthExceedStrategy.TRIM_LENGTH
- : fieldSpec.getMaxLengthExceedStrategy();
- _columnToColumnInfoMap.put(fieldSpec.getName(), new
SanitizedColumnInfo(fieldSpec.getName(),
- fieldSpec.getMaxLength(), maxLengthExceedStrategy,
fieldSpec.getDefaultNullValue()));
- } else if (fieldSpec.getDataType().equals(FieldSpec.DataType.JSON) ||
fieldSpec.getDataType()
- .equals(FieldSpec.DataType.BYTES)) {
- maxLengthExceedStrategy = fieldSpec.getMaxLengthExceedStrategy() ==
null
- ? FieldSpec.MaxLengthExceedStrategy.NO_ACTION :
fieldSpec.getMaxLengthExceedStrategy();
- if
(maxLengthExceedStrategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION)) {
- continue;
+ FieldSpec.DataType dataType = fieldSpec.getDataType();
+ if (dataType.equals(FieldSpec.DataType.STRING)
+ || dataType.equals(FieldSpec.DataType.JSON)
+ || dataType.equals(FieldSpec.DataType.BYTES)) {
+
+ FieldSpec.MaxLengthExceedStrategy strategy =
fieldSpec.getEffectiveMaxLengthExceedStrategy();
+ if (!strategy.equals(FieldSpec.MaxLengthExceedStrategy.NO_ACTION) ||
dataType.equals(
+ FieldSpec.DataType.STRING)) {
Review Comment:
Yes, try to make this part a pure refactor rather than changing the existing
behavior here: even if MaxLengthExceedStrategy.NO_ACTION is used for string
type, other sanitization logic is still able to be applied when it is String.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]