[ 
https://issues.apache.org/jira/browse/IMPALA-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated IMPALA-14328:
---------------------------------
    Description: 
We found that Impala could not produce a structurally same column lineage graph 
when Calcite is the planner. For instance, consider the following query in 
[lineage.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/lineage.test].
{code:java}
select * from (
  select tinyint_col + int_col x from functional.alltypes
  union all
  select sum(bigint_col) y from (select bigint_col from functional.alltypes) 
v1) v2
{code}
We expect Impala to produce a graph with 4 vertices and 1 edge. However, we 
only get one vertice when Calcite is the planner.
{code:java}
{
    "edges": [
        {
            "edgeType": "PROJECTION",
            "sources": [], 
            "targets": [
                0
            ]
        }
    ],  
    "endTime": 1755630445,
    "hash": "3968bd65781e9e856eaca799f4501513",
    "queryId": "fb443702ac817ecc:c432854600000000",
    "queryText": "select * from (   select tinyint_col + int_col x from 
functional.alltypes   union all   select sum(bigint_col) y from (select 
bigint_col from functional.alltypes) 
    "timestamp": 1755630437,
    "user": "fangyurao",
    "vertices": [
        {
            "id": 0,
            "vertexId": "X",
            "vertexType": "COLUMN"
        }
    ]   
}
{code}
 

To reproduce this issue, we could perform the following steps.
 - Execute "{{{}export USE_CALCITE_PLANNER=true{}}}" on the command line.
 - Start the Impala service from the command line using the following.
{code:java}
$IMPALA_HOME/bin/start-impala-cluster.py \
'--impalad_args=--lineage_event_log_dir=/tmp/impala_test_lineage_U_j964 
--use_local_catalog=false' \
'--catalogd_args=--catalog_topic_mode=full'
{code}

 - Execute "{{{}impala-shell{}}}" on the command enter the Impala shell.
 - Execute "{{{}set use_calcite_planner=1{}}}" in the Impala shell.
 - Submit the query to Impala server in the Impala shell.
 - Collect the lineage file from under the folder 
"{{{}/tmp/impala_test_lineage_U_j964{}}}" as specified when we started the 
Impala service.

  was:
We found that Impala could not produce a structurally same column lineage graph 
when Calcite is the planner. For instance, consider the following query in 
[lineage.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/lineage.test].
{code:java}
select * from (
  select tinyint_col + int_col x from functional.alltypes
  union all
  select sum(bigint_col) y from (select bigint_col from functional.alltypes) 
v1) v2
{code}

We expect Impala to produce a graph with 4 vertices and 1 edge. However, we 
only get one vertice when Calcite is the planner.
{code}
{
    "edges": [
        {
            "edgeType": "PROJECTION",
            "sources": [], 
            "targets": [
                0
            ]
        }
    ],  
    "endTime": 1755630445,
    "hash": "3968bd65781e9e856eaca799f4501513",
    "queryId": "fb443702ac817ecc:c432854600000000",
    "queryText": "select * from (   select tinyint_col + int_col x from 
functional.alltypes   union all   select sum(bigint_col) y from (select 
bigint_col from functional.alltypes) 
    "timestamp": 1755630437,
    "user": "fangyurao",
    "vertices": [
        {
            "id": 0,
            "vertexId": "X",
            "vertexType": "COLUMN"
        }
    ]   
}
{code}


> Calcite Planner: Produce column lineage graph when Calcite is the planner
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-14328
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14328
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Fang-Yu Rao
>            Assignee: Fang-Yu Rao
>            Priority: Major
>
> We found that Impala could not produce a structurally same column lineage 
> graph when Calcite is the planner. For instance, consider the following query 
> in 
> [lineage.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/lineage.test].
> {code:java}
> select * from (
>   select tinyint_col + int_col x from functional.alltypes
>   union all
>   select sum(bigint_col) y from (select bigint_col from functional.alltypes) 
> v1) v2
> {code}
> We expect Impala to produce a graph with 4 vertices and 1 edge. However, we 
> only get one vertice when Calcite is the planner.
> {code:java}
> {
>     "edges": [
>         {
>             "edgeType": "PROJECTION",
>             "sources": [], 
>             "targets": [
>                 0
>             ]
>         }
>     ],  
>     "endTime": 1755630445,
>     "hash": "3968bd65781e9e856eaca799f4501513",
>     "queryId": "fb443702ac817ecc:c432854600000000",
>     "queryText": "select * from (   select tinyint_col + int_col x from 
> functional.alltypes   union all   select sum(bigint_col) y from (select 
> bigint_col from functional.alltypes) 
>     "timestamp": 1755630437,
>     "user": "fangyurao",
>     "vertices": [
>         {
>             "id": 0,
>             "vertexId": "X",
>             "vertexType": "COLUMN"
>         }
>     ]   
> }
> {code}
>  
> To reproduce this issue, we could perform the following steps.
>  - Execute "{{{}export USE_CALCITE_PLANNER=true{}}}" on the command line.
>  - Start the Impala service from the command line using the following.
> {code:java}
> $IMPALA_HOME/bin/start-impala-cluster.py \
> '--impalad_args=--lineage_event_log_dir=/tmp/impala_test_lineage_U_j964 
> --use_local_catalog=false' \
> '--catalogd_args=--catalog_topic_mode=full'
> {code}
>  - Execute "{{{}impala-shell{}}}" on the command enter the Impala shell.
>  - Execute "{{{}set use_calcite_planner=1{}}}" in the Impala shell.
>  - Submit the query to Impala server in the Impala shell.
>  - Collect the lineage file from under the folder 
> "{{{}/tmp/impala_test_lineage_U_j964{}}}" as specified when we started the 
> Impala service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to