[I] [VL][1.2] Gluten integration with the kyuubi authz plugin causes a fallback due to unsupported empty2null [incubator-gluten]

via GitHub Mon, 04 Aug 2025 01:17:45 -0700


wForget opened a new issue, #10354:
URL: https://github.com/apache/incubator-gluten/issues/10354


   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   Gluten integration with the kyuubi authz plugin causes a hive wirte 
fallback, because the kyuubi authz plugin [repeatedly calls the spark 
optimezer](https://github.com/apache/kyuubi/blob/9a0c49e79135cd90368986176591a80d29634231/extensions/spark/kyuubi-spark-authz/src/main/scala/org/apache/kyuubi/plugin/spark/authz/rule/RuleEliminatePermanentViewMarker.scala#L56),
 which changed the spark plan.
   
   fallback msg:
   ```
   == Fallback Summary ==
   (29) Project: Native validation failed:
     Validation failed due to exception caught at 
file:SubstraitToVeloxPlanValidator.cc line:1310 function:validate, thrown from 
file:ExprCompiler.cpp line:462 function:compileRewrittenExpression, 
reason:Scalar function name not registered: empty2null, called with arguments: 
(VARCHAR).
   ```
   
   spark plan without kyuubi authz plugin:
   
   ```
   (28) SortMergeJoin
   Left keys [1]: [a#36]
   Right keys [1]: [a#51]
   Join type: Inner
   Join condition: None
   
   (29) Project
   Output [5]: [a#36, b#52, c#53 AS p1#28, p_2 AS p2#29, e#55 AS p3#30]
   Input [5]: [a#36, a#51, b#52, c#53, e#55]
   
   (30) Exchange
   Input [5]: [a#36, b#52, p1#28, p2#29, p3#30]
   Arguments: hashpartitioning(p1#28, p_2, p3#30, 400), 
REBALANCE_PARTITIONS_BY_COL, [plan_id=76]
   
   (31) Project
   Output [5]: [a#36, b#52, empty2null(p1#28) AS p1#66, p2#29, 
empty2null(p3#30) AS p3#67]
   Input [5]: [a#36, b#52, p1#28, p2#29, p3#30]
   
   (32) Sort
   Input [5]: [a#36, b#52, p1#66, p2#29, p3#67]
   Arguments: [p1#66 ASC NULLS FIRST, p2#29 ASC NULLS FIRST, p3#67 ASC NULLS 
FIRST], false, 0
   
   (33) WriteFiles
   Input [5]: [a#36, b#52, p1#66, p2#29, p3#67]
   
   (34) Execute InsertIntoHiveTable
   Input: []
   Arguments: `spark_catalog`.`sample`.`wangzhen_20250731_001`, 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [p1=None, p2=None, 
p3=None], true, false, [a, b, p1, p2, p3], [p1#66, p2#29, p3#67], 
org.apache.spark.sql.hive.execution.HiveFileFormat@29679638, 
org.apache.spark.sql.hive.execution.HiveTempPath@3fdb2711
   ```
   
   spark plan with kyuubi authz plugin:
   
   ```
   (33) SortMergeJoin
   Left keys [1]: [a#122]
   Right keys [1]: [a#132]
   Join type: Inner
   Join condition: None
   
   (34) Project
   Output [4]: [a#122, b#133, c#134 AS p1#114, e#136 AS p3#116]
   Input [5]: [a#122, a#132, b#133, c#134, e#136]
   
   (35) Exchange
   Input [4]: [a#122, b#133, p1#114, p3#116]
   Arguments: hashpartitioning(p1#114, p_2, p3#116, 400), 
REBALANCE_PARTITIONS_BY_COL, [plan_id=552]
   
   (36) Project
   Output [5]: [a#122, b#133, empty2null(p1#114) AS p1#142, p_2 AS p2#115, 
empty2null(p3#116) AS p3#143]
   Input [4]: [a#122, b#133, p1#114, p3#116]
   
   (37) Sort
   Input [5]: [a#122, b#133, p1#142, p2#115, p3#143]
   Arguments: [p1#142 ASC NULLS FIRST, p3#143 ASC NULLS FIRST], false, 0
   
   (38) WriteFiles
   Input [5]: [a#122, b#133, p1#142, p2#115, p3#143]
   
   (39) Execute InsertIntoHiveTable
   Input: []
   Arguments: `spark_catalog`.`sample`.`wangzhen_20250731_001`, 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, [p1=None, p2=None, 
p3=None], true, false, [a, b, p1, p2, p3], [p1#142, p2#115, p3#143], 
org.apache.spark.sql.hive.execution.HiveFileFormat@5727d79a, 
org.apache.spark.sql.hive.execution.HiveTempPath@5fd6fd77
   ```
   
   reproduced SQLs:
   
   ```
   create table test_20250731_001 (a string, b int) partitioned by (p1 string, 
p2 string, p3 string);
   create table test_20250731_002 (a string, b int, p1 string, p2 string, p3 
string);
   create table test_20250731_003 (a string, b int, c string, d string, e 
string);
   
   create view test_view_20250731_002 as select * from test_20250731_002 where 
b > 0;
   create view test_view_20250731_003 as select * from test_20250731_003 where 
b > 0;
   
   set spark.sql.autoBroadcastJoinThreshold=-1;
   insert overwrite table test_20250731_001 partition(p1, p2, p3)
   select t1.a as a, t2.b as b, t2.c as p1, 'p_2' as p2, t2.e as p3 from 
test_view_20250731_002 t1 join test_view_20250731_003 t2 on t1.a = t2.a;
   ```
   
   ### Gluten version
   
   Gluten-1.2
   
   ### Spark version
   
   Spark-3.5.x
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL][1.2] Gluten integration with the kyuubi authz plugin causes a fallback due to unsupported empty2null [incubator-gluten]

Reply via email to