[ 
https://issues.apache.org/jira/browse/HIVE-25690?focusedWorklogId=680335&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-680335
 ]

ASF GitHub Bot logged work on HIVE-25690:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 11/Nov/21 16:14
            Start Date: 11/Nov/21 16:14
    Worklog Time Spent: 10m 
      Work Description: marton-bod opened a new pull request #2779:
URL: https://github.com/apache/hive/pull/2779


   ### What changes were proposed in this pull request?
   Revamp the algorithm which detects reordered columns. The current 
implementation is faulty. The new idea is too look for the column which has the 
highest index difference between its position in the HMS and Iceberg schemas.
   E.g. Current schema: A, B, C, D
   New schema (A moved to the end): B, C, D, A
   Index difference for each column: A: 3, B: 1, C: 1, D: 1
   So we know that A was the one that got moved.
   
   In general, there are 3 scenarios: 
   1) highest index diff = 0 -> there were no reorders
   2) highest index diff = 1 -> two adjacent columns got swapped with each 
other:
   E.g. A, B, C, D -> A, C, B, D
   In this case we cannot identify for sure which one got moved by the user, 
but we it's okay because we can either do moveAfter(C, A) or moveAfter(B, C), 
they should be equivalent operations
   3) highest index diff > 1 -> the reordered column can be identified 
definitively
   
   ### Why are the changes needed?
   Fix correctness problem
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   new unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 680335)
    Remaining Estimate: 0h
            Time Spent: 10m

> Fix column reorder detection for Iceberg schema evolution
> ---------------------------------------------------------
>
>                 Key: HIVE-25690
>                 URL: https://issues.apache.org/jira/browse/HIVE-25690
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Marton Bod
>            Assignee: Marton Bod
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current algorithm for detecting schema differences between HMS and Iceberg 
> schema is broken when it comes to column reorders. This patch should fix that 
> up and add more extensive testing.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to