[
https://issues.apache.org/jira/browse/IMPALA-14328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fang-Yu Rao updated IMPALA-14328:
---------------------------------
Description:
We found that Impala could not produce a structurally same column lineage graph
when Calcite is the planner. For instance, consider the following query in
[lineage.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/lineage.test].
{code:java}
select * from (
select tinyint_col + int_col x from functional.alltypes
union all
select sum(bigint_col) y from (select bigint_col from functional.alltypes)
v1) v2
{code}
We expect Impala to produce a graph with 4 vertices and 1 edge. However, we
only get one vertice when Calcite is the planner.
{code:java}
{
"edges": [
{
"edgeType": "PROJECTION",
"sources": [],
"targets": [
0
]
}
],
"endTime": 1755630445,
"hash": "3968bd65781e9e856eaca799f4501513",
"queryId": "fb443702ac817ecc:c432854600000000",
"queryText": "select * from ( select tinyint_col + int_col x from
functional.alltypes union all select sum(bigint_col) y from (select
bigint_col from functional.alltypes)
"timestamp": 1755630437,
"user": "fangyurao",
"vertices": [
{
"id": 0,
"vertexId": "X",
"vertexType": "COLUMN"
}
]
}
{code}
To reproduce this issue, we could perform the following steps.
- Execute "{{{}export USE_CALCITE_PLANNER=true{}}}" on the command line.
- Start the Impala service from the command line using the following.
{code:java}
$IMPALA_HOME/bin/start-impala-cluster.py \
'--impalad_args=--lineage_event_log_dir=/tmp/impala_test_lineage_U_j964
--use_local_catalog=false' \
'--catalogd_args=--catalog_topic_mode=full'
{code}
- Execute "{{{}impala-shell{}}}" on the command enter the Impala shell.
- Execute "{{{}set use_calcite_planner=1{}}}" in the Impala shell.
- Submit the query to Impala server in the Impala shell.
- Collect the lineage file from under the folder
"{{{}/tmp/impala_test_lineage_U_j964{}}}" as specified when we started the
Impala service.
was:
We found that Impala could not produce a structurally same column lineage graph
when Calcite is the planner. For instance, consider the following query in
[lineage.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/lineage.test].
{code:java}
select * from (
select tinyint_col + int_col x from functional.alltypes
union all
select sum(bigint_col) y from (select bigint_col from functional.alltypes)
v1) v2
{code}
We expect Impala to produce a graph with 4 vertices and 1 edge. However, we
only get one vertice when Calcite is the planner.
{code}
{
"edges": [
{
"edgeType": "PROJECTION",
"sources": [],
"targets": [
0
]
}
],
"endTime": 1755630445,
"hash": "3968bd65781e9e856eaca799f4501513",
"queryId": "fb443702ac817ecc:c432854600000000",
"queryText": "select * from ( select tinyint_col + int_col x from
functional.alltypes union all select sum(bigint_col) y from (select
bigint_col from functional.alltypes)
"timestamp": 1755630437,
"user": "fangyurao",
"vertices": [
{
"id": 0,
"vertexId": "X",
"vertexType": "COLUMN"
}
]
}
{code}
> Calcite Planner: Produce column lineage graph when Calcite is the planner
> -------------------------------------------------------------------------
>
> Key: IMPALA-14328
> URL: https://issues.apache.org/jira/browse/IMPALA-14328
> Project: IMPALA
> Issue Type: Sub-task
> Components: Frontend
> Reporter: Fang-Yu Rao
> Assignee: Fang-Yu Rao
> Priority: Major
>
> We found that Impala could not produce a structurally same column lineage
> graph when Calcite is the planner. For instance, consider the following query
> in
> [lineage.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/lineage.test].
> {code:java}
> select * from (
> select tinyint_col + int_col x from functional.alltypes
> union all
> select sum(bigint_col) y from (select bigint_col from functional.alltypes)
> v1) v2
> {code}
> We expect Impala to produce a graph with 4 vertices and 1 edge. However, we
> only get one vertice when Calcite is the planner.
> {code:java}
> {
> "edges": [
> {
> "edgeType": "PROJECTION",
> "sources": [],
> "targets": [
> 0
> ]
> }
> ],
> "endTime": 1755630445,
> "hash": "3968bd65781e9e856eaca799f4501513",
> "queryId": "fb443702ac817ecc:c432854600000000",
> "queryText": "select * from ( select tinyint_col + int_col x from
> functional.alltypes union all select sum(bigint_col) y from (select
> bigint_col from functional.alltypes)
> "timestamp": 1755630437,
> "user": "fangyurao",
> "vertices": [
> {
> "id": 0,
> "vertexId": "X",
> "vertexType": "COLUMN"
> }
> ]
> }
> {code}
>
> To reproduce this issue, we could perform the following steps.
> - Execute "{{{}export USE_CALCITE_PLANNER=true{}}}" on the command line.
> - Start the Impala service from the command line using the following.
> {code:java}
> $IMPALA_HOME/bin/start-impala-cluster.py \
> '--impalad_args=--lineage_event_log_dir=/tmp/impala_test_lineage_U_j964
> --use_local_catalog=false' \
> '--catalogd_args=--catalog_topic_mode=full'
> {code}
> - Execute "{{{}impala-shell{}}}" on the command enter the Impala shell.
> - Execute "{{{}set use_calcite_planner=1{}}}" in the Impala shell.
> - Submit the query to Impala server in the Impala shell.
> - Collect the lineage file from under the folder
> "{{{}/tmp/impala_test_lineage_U_j964{}}}" as specified when we started the
> Impala service.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]