wForget opened a new issue, #10354: URL: https://github.com/apache/incubator-gluten/issues/10354
### Backend VL (Velox) ### Bug description Gluten integration with the kyuubi authz plugin causes a hive wirte fallback, because the kyuubi authz plugin [repeatedly calls the spark optimezer](https://github.com/apache/kyuubi/blob/9a0c49e79135cd90368986176591a80d29634231/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/RuleEliminatePermanentViewMarker.scala#L56), which changed the spark plan. fallback msg: ``` == Fallback Summary == (29) Project: Native validation failed: Validation failed due to exception caught at file:SubstraitToVeloxPlanValidator.cc line:1310 function:validate, thrown from file:ExprCompiler.cpp line:462 function:compileRewrittenExpression, reason:Scalar function name not registered: empty2null, called with arguments: (VARCHAR). ``` spark plan without kyuubi authz plugin: ``` (28) SortMergeJoin Left keys [1]: [a#36] Right keys [1]: [a#51] Join type: Inner Join condition: None (29) Project Output [5]: [a#36, b#52, c#53 AS p1#28, p_2 AS p2#29, e#55 AS p3#30] Input [5]: [a#36, a#51, b#52, c#53, e#55] (30) Exchange Input [5]: [a#36, b#52, p1#28, p2#29, p3#30] Arguments: hashpartitioning(p1#28, p_2, p3#30, 400), REBALANCE_PARTITIONS_BY_COL, [plan_id=76] (31) Project Output [5]: [a#36, b#52, empty2null(p1#28) AS p1#66, p2#29, empty2null(p3#30) AS p3#67] Input [5]: [a#36, b#52, p1#28, p2#29, p3#30] (32) Sort Input [5]: [a#36, b#52, p1#66, p2#29, p3#67] Arguments: [p1#66 ASC NULLS FIRST, p2#29 ASC NULLS FIRST, p3#67 ASC NULLS FIRST], false, 0 (33) WriteFiles Input [5]: [a#36, b#52, p1#66, p2#29, p3#67] (34) Execute InsertIntoHiveTable Input: [] Arguments: `spark_catalog`.`sample`.`wangzhen_20250731_001`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [p1=None, p2=None, p3=None], true, false, [a, b, p1, p2, p3], [p1#66, p2#29, p3#67], org.apache.spark.sql.hive.execution.HiveFileFormat@29679638, org.apache.spark.sql.hive.execution.HiveTempPath@3fdb2711 ``` spark plan with kyuubi authz plugin: ``` (33) SortMergeJoin Left keys [1]: [a#122] Right keys [1]: [a#132] Join type: Inner Join condition: None (34) Project Output [4]: [a#122, b#133, c#134 AS p1#114, e#136 AS p3#116] Input [5]: [a#122, a#132, b#133, c#134, e#136] (35) Exchange Input [4]: [a#122, b#133, p1#114, p3#116] Arguments: hashpartitioning(p1#114, p_2, p3#116, 400), REBALANCE_PARTITIONS_BY_COL, [plan_id=552] (36) Project Output [5]: [a#122, b#133, empty2null(p1#114) AS p1#142, p_2 AS p2#115, empty2null(p3#116) AS p3#143] Input [4]: [a#122, b#133, p1#114, p3#116] (37) Sort Input [5]: [a#122, b#133, p1#142, p2#115, p3#143] Arguments: [p1#142 ASC NULLS FIRST, p3#143 ASC NULLS FIRST], false, 0 (38) WriteFiles Input [5]: [a#122, b#133, p1#142, p2#115, p3#143] (39) Execute InsertIntoHiveTable Input: [] Arguments: `spark_catalog`.`sample`.`wangzhen_20250731_001`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [p1=None, p2=None, p3=None], true, false, [a, b, p1, p2, p3], [p1#142, p2#115, p3#143], org.apache.spark.sql.hive.execution.HiveFileFormat@5727d79a, org.apache.spark.sql.hive.execution.HiveTempPath@5fd6fd77 ``` reproduced SQLs: ``` create table test_20250731_001 (a string, b int) partitioned by (p1 string, p2 string, p3 string); create table test_20250731_002 (a string, b int, p1 string, p2 string, p3 string); create table test_20250731_003 (a string, b int, c string, d string, e string); create view test_view_20250731_002 as select * from test_20250731_002 where b > 0; create view test_view_20250731_003 as select * from test_20250731_003 where b > 0; set spark.sql.autoBroadcastJoinThreshold=-1; insert overwrite table test_20250731_001 partition(p1, p2, p3) select t1.a as a, t2.b as b, t2.c as p1, 'p_2' as p2, t2.e as p3 from test_view_20250731_002 t1 join test_view_20250731_003 t2 on t1.a = t2.a; ``` ### Gluten version Gluten-1.2 ### Spark version Spark-3.5.x ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs ```bash ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
