njalan opened a new issue, #5253: URL: https://github.com/apache/hudi/issues/5253
I am trying to get column lineage from spark sql query plan Below is my sql for testing and all the tables are hudi table. insert into test.datahub_3 select a.email, b.phone from test.datahub_1 a, test.datahub_2 b where a.phone=b.phone Below is the code: def lineageParser(qe: QueryExecution): Unit = { val analyzedLogicPlan = qe.analyzed logInfo("----------- start analyzed plan --------") analyzedLogicPlan.foreach(plan => { plan match{ case _ => println(plan.getClass) } }) logInfo("----------- end analyzed plan --------") } Below is the output for query on hive tables: **class org.apache.spark.sql.hive.execution.InsertIntoHiveTable class org.apache.spark.sql.catalyst.plans.logical.Project class org.apache.spark.sql.catalyst.plans.logical.Join class org.apache.spark.sql.catalyst.plans.logical.Project class org.apache.spark.sql.catalyst.plans.logical.Filter class org.apache.spark.sql.execution.datasources.LogicalRelation class org.apache.spark.sql.catalyst.plans.logical.Project class org.apache.spark.sql.catalyst.plans.logical.Filter class org.apache.spark.sql.execution.datasources.LogicalRelation** Below is the output for query on hudi tables: **class org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand** They are totally different: Environment Description * Hudi version : 0.9 * Spark version : 3.01 * Hive version : 3.2 * Hadoop version : 3.2.1 * Storage (HDFS/S3/GCS..) : s3 * Running on Docker? (yes/no) : no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org