Gabriel39 opened a new pull request, #63830:
URL: https://github.com/apache/doris/pull/63830

   ### What problem does this PR solve?
   
   Issue Number: close #26035
   
   Related PR: None
   
   Problem Summary: Iceberg v3 row lineage metadata columns could return 
correct values in plain projections but wrong results in DISTINCT, GROUP BY, 
COUNT(DISTINCT), and NDV. For data files that do not physically contain row 
lineage columns, Doris filled generated column values only when the placeholder 
was NULL. Some vectorized aggregation paths may provide a non-null placeholder 
column, so all rows from the same file kept the same first-row value. This 
change distinguishes physical row lineage columns from generated ones. Missing 
row lineage columns are fully generated for every batch row from first_row_id 
plus row position, while physical row lineage columns still preserve stored 
values and only backfill NULLs.
   
   ### Release note
   
   Fix incorrect DISTINCT/GROUP BY/NDV results for Iceberg v3 _row_id generated 
metadata columns.
   
   ### Check List (For Author)
   
   - Test: Regression test
       - Added test_iceberg_v3_row_lineage_uniqueness_stability covering 
DISTINCT, GROUP BY, COUNT(DISTINCT), and NDV on _row_id.
       - Ran git diff --check and git diff --cached --check.
       - Could not run build-support/clang-format.sh because llvm@16 is not 
installed.
       - Could not run ./run-regression-test.sh --run -d 
external_table_p0/iceberg -s test_iceberg_v3_row_lineage_uniqueness_stability 
because mvn is not installed.
       - Could not run ./run-be-ut.sh --run iceberg because JAVA_HOME points to 
JDK 11 and JDK_17 is unset.
   - Behavior changed: Yes. Generated Iceberg v3 row lineage columns now 
produce per-row values for aggregation as well as projection.
   - Does this need documentation: No
   
   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #xxx
   
   Problem Summary:
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to