eldenmoon opened a new pull request, #63970: URL: https://github.com/apache/doris/pull/63970
## Proposed changes This patch reduces parse-time memory for sparse plain dynamic Variant columns. - Parse plain dynamic non-doc Variant object JSON into doc-value KV during storage parse instead of eagerly expanding every path into subcolumns. - Keep the old eager subcolumn parse path for cases that still depend on parse-time path/type metadata: nested group, deprecated flatten nested, predefined typed paths, and parent inverted index columns. - Add a writer-side doc-value plan in `VariantColumnWriterImpl` to choose materialized paths, write them through the materialized subcolumn flow, and write the remaining paths to sparse columns. - Move sparse handling for this path into `VariantColumnWriterImpl` and add focused BE UT coverage. The sparse parse memory UT simulates the CIR-20431 shape and shows: ```text old_subcolumns=1001 new_subcolumns=1 old_bytes=6224384 new_bytes=45056 ``` This is `ColumnVariant::allocated_bytes()` in the unit test, not process RSS. ## Testing Current head `f971585d87bd73b0e4d447f2760004fc3a5f2051` on latest `upstream/master`: - `git diff --check upstream/master...HEAD` - `env DORIS_CLANG_HOME=/mnt/disk1/claude-max/ldb_toolchain20 PATH=/mnt/disk1/claude-max/ldb_toolchain20/bin:$PATH ./run-be-ut.sh --run --filter='VariantUtilTest.ParseVariantColumns_StorageNonDocScalarJsonToDocValueKv:VariantUtilTest.SparseStorageParseUsesDocValueKvInsteadOfManySubcolumns:VariantUtilTest.ParseVariantColumns_StorageNonDocDocValueKvSkipsInvalidRoot:VariantColumnWriterReaderTest.test_storage_parse_kv_write_materialized_and_sparse'` Also verified before rebasing to the latest master: - Release BE build passed. - `./run-be-ut.sh --run --filter='*Variant*'`: 193 passed, 1 skipped. - Targeted `variant_p0` suites passed, including `desc`, `test_types_in_variant`, delete/update, predefine typed-to-sparse, schema change, and external meta edge cases. - Full `variant_p0` attempted: 139 suites, 7 failed. The failures were unrelated environment/framework issues: one OSS `InvalidAccessKeyId` for outfile, and six `/api/debug_point/remove/...` HTTP 500 failures while cleaning debug points. No product assertion mismatch was found in the modified Variant writer/parse paths. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
