xiaokang commented on code in PR #23498: URL: https://github.com/apache/doris/pull/23498#discussion_r1314143669
########## gensrc/proto/segment_v2.proto: ########## @@ -137,6 +137,20 @@ message ZoneMapPB { optional bool pass_all = 5 [default = false]; } +// For semi-structure column info, perisist info for PathInData +message ColumnPathPartInfo { + optional string key = 1; Review Comment: what's the meaning of key? Add comment for each field. ########## fe/fe-core/src/main/java/org/apache/doris/analysis/SlotRef.java: ########## @@ -72,6 +73,14 @@ public SlotRef(TableName tblName, String col) { this.label = "`" + col + "`"; } + public SlotRef(TableName tblName, String col, List<String> subColLables) { + super(); + this.tblName = tblName; + this.col = col; + this.label = "`" + col + "`"; + this.subColLables = subColLables; Review Comment: It seems that label is `` escaped column name. What's the meaing of subColLables? ########## fe/fe-core/src/main/java/org/apache/doris/analysis/SlotDescriptor.java: ########## @@ -304,6 +304,9 @@ public TSlotDescriptor toThrift() { tSlotDescriptor.setColUniqueId(column.getUniqueId()); tSlotDescriptor.setIsKey(column.isKey()); } + if (subColLabels != null) { + tSlotDescriptor.setColumnPaths(subColLabels); Review Comment: The name is not consistent between thrift and java object. ########## gensrc/proto/segment_v2.proto: ########## @@ -161,7 +175,12 @@ message ColumnMetaPB { // required by array/struct/map reader to create child reader. optional uint64 num_rows = 11; repeated string children_column_names = 12; + // persist info for PathInData that represents path in document, e.g. JSON. + optional ColumnPathInfo column_path_info = 13; Review Comment: Does column_path_info mean a single subcolumn of variant or all sub columns? ########## fe/fe-core/src/main/java/org/apache/doris/planner/FileLoadScanNode.java: ########## @@ -313,10 +313,6 @@ protected void finalizeParamsForLoad(ParamCreateContext context, String name = "jsonb_parse_" + nullable + "_error_to_null"; expr = new FunctionCallExpr(name, args); expr.analyze(analyzer); - } else if (dstType == PrimitiveType.VARIANT) { - // Generate SchemaChange expr for dynamicly generating columns - TableIf targetTbl = desc.getTable(); - expr = new SchemaChangeExpr((SlotRef) expr, (int) targetTbl.getId()); Review Comment: old dynamic table code? ########## fe/fe-core/src/main/java/org/apache/doris/analysis/SlotDescriptor.java: ########## @@ -64,7 +66,6 @@ public class SlotDescriptor { private ColumnStats stats; // only set if 'column' isn't set private boolean isAgg; - private boolean isMultiRef; Review Comment: Why delete it? ########## gensrc/thrift/Descriptors.thrift: ########## @@ -59,6 +59,8 @@ struct TSlotDescriptor { // materialize them.Used to optmize to read less data and less memory usage 13: optional bool need_materialize = true 14: optional bool is_auto_increment = false; + // `$.a.b.c` => ['$', 'a', 'b', 'c'] + 15: optional list<string> column_paths Review Comment: submit a small seperate pr to change the thrift/pb for variant to be merged quickly ########## fe/fe-common/src/main/java/org/apache/doris/catalog/Type.java: ########## @@ -725,6 +726,8 @@ public static boolean canCastTo(Type sourceType, Type targetType) { return true; } else if (sourceType.isStructType() && targetType.isStructType()) { return StructType.canCastTo((StructType) sourceType, (StructType) targetType); + } else if (sourceType.isVariantType() && targetType.isArrayType()) { Review Comment: Can variant be cast to struct or map? ########## be/src/common/config.h: ########## @@ -1109,6 +1107,12 @@ DECLARE_mInt64(lookup_connection_cache_bytes_limit); // level of compression when using LZ4_HC, whose defalut value is LZ4HC_CLEVEL_DEFAULT DECLARE_mInt64(LZ4_HC_compression_level); +// Whether flatten nested arrays in variant column +// Notice: TEST ONLY +DECLARE_mBool(enable_flatten_nested_for_variant); Review Comment: use variant_ prefix for variant related configurations ########## gensrc/proto/segment_v2.proto: ########## @@ -137,6 +137,20 @@ message ZoneMapPB { optional bool pass_all = 5 [default = false]; } +// For semi-structure column info, perisist info for PathInData +message ColumnPathPartInfo { + optional string key = 1; + optional bool is_nested = 2; + optional uint32 anonymous_array_level = 3; +} + +message ColumnPathInfo { + optional string path = 1; + repeated ColumnPathPartInfo path_part_infos = 2; Review Comment: why path part info is a list? ########## gensrc/proto/olap_file.proto: ########## @@ -206,6 +206,8 @@ message ColumnPB { repeated ColumnPB children_columns = 17; repeated string children_column_names = 18; optional bool result_is_nullable = 19; + // persist info for PathInData that represents path in document, e.g. JSON. + optional segment_v2.ColumnPathInfo column_path_info = 20; Review Comment: Is column_path_info attached to parent vairant column or seperated from parent variant column? ########## fe/fe-core/src/main/java/org/apache/doris/analysis/SlotRef.java: ########## @@ -359,6 +377,13 @@ public boolean notCheckDescIdEquals(Object obj) { if (col != null && !col.equalsIgnoreCase(other.col)) { return false; } + if ((subColLables == null) != (other.subColLables == null)) { + return false; + } + if (subColLables != null + && String.join(".", subColLables).equals(String.join(".", other.subColLables))) { Review Comment: subColLables.equals(other.subColLables) should be enough. ########## fe/fe-common/src/main/java/org/apache/doris/catalog/Type.java: ########## @@ -544,10 +549,6 @@ public boolean isComplexType() { return isStructType() || isCollectionType(); } - public boolean isVariantType() { - return this instanceof VariantType; Review Comment: It seems to more reasonable than isScalarType(VARIANT). ########## fe/fe-core/src/main/java/org/apache/doris/analysis/DescriptorTable.java: ########## @@ -69,7 +69,15 @@ public TupleDescriptor createTupleDescriptor(String debugName) { return d; } + public static void printStackTrace() { + StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace(); + for (StackTraceElement element : stackTrace) { + System.out.println(element.toString()); + } + } + public SlotDescriptor addSlotDescriptor(TupleDescriptor d) { + printStackTrace(); Review Comment: debug code? ########## gensrc/thrift/Descriptors.thrift: ########## @@ -59,6 +59,8 @@ struct TSlotDescriptor { // materialize them.Used to optmize to read less data and less memory usage 13: optional bool need_materialize = true 14: optional bool is_auto_increment = false; + // `$.a.b.c` => ['$', 'a', 'b', 'c'] + 15: optional list<string> column_paths Review Comment: Do you change TSlotDescriptor from a column to a non-variant column or a variant subcolumn? ########## fe/fe-core/src/main/java/org/apache/doris/analysis/Analyzer.java: ########## @@ -745,6 +751,18 @@ public TupleDescriptor registerOlapTable(Table table, TableName tableName, List< slot.setIsNullable(col.isAllowNull()); String key = tableRef.aliases[0] + "." + col.getName(); slotRefMap.put(key, slot); + + if (col.getType().isVariantType()) { + LOG.debug("add subColumnSlotRefMap, key {}, column {}", key, col.toThrift()); + subColumnSlotRefMap.put(key, Maps.newTreeMap( + new Comparator<List<String>>() { + public int compare(List<String> lst1, List<String> lst2) { Review Comment: Maybe there is a compare method for list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org