> I ran into an exception issue when playing around spark connect, more details can be found at https://issues.apache.org/jira/browse/SPARK-51451
> pyspark.errors.exceptions.connect.AnalysisException: [UNSUPPORTED_GENERATOR.NESTED_IN_EXPRESSIONS] The generator is not supported: nested in expressions "unresolvedstarwithcolumns(explode(array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)))". SQLSTATE: 42K0E It's a regression. The same code could pass on spark 3.5.5 connect, but failed on spark 4.0.0-rc2 and the latest 4.1.0 Bobby <wbo4...@gmail.com> 于2025年3月10日周一 16:30写道: > I ran into an exception issue when playing around spark connect, more > details can be found at https://issues.apache.org/jira/browse/SPARK-51451 > > pyspark.errors.exceptions.connect.AnalysisException: > [UNSUPPORTED_GENERATOR.NESTED_IN_EXPRESSIONS] The generator is not > supported: nested in expressions > "unresolvedstarwithcolumns(explode(array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, > 11)))". SQLSTATE: 42K0E > > Cheng Pan <pan3...@gmail.com> 于2025年3月8日周六 02:16写道: > >> -1 >> >> SPARK-51029 (GitHub PR [1]) removes `hive-llap-common` from Spark binary >> distribution, which technically >> breaks the feature "Spark SQL supports integration of Hive UDFs, UDAFs >> and UDTFs"[2], more precisely, it change >> Hive UDF support from batteries included to not. >> >> In details, when user runs a query like CREATE TEMPORARY FUNCTION hello >> AS 'my.HelloUDF', it triggers >> o.a.h.hive.ql.exec.FunctionRegistry initialization, which also >> initializes the Hive built-in UDFs, UDAFs and >> UDTFs[3], then NoClassDefFoundError ocuurs due to some built-in UDTFs >> depend on class in hive-llap-common. >> >> >> org.apache.spark.sql.execution.QueryExecutionException: >> java.lang.NoClassDefFoundError: >> org/apache/hadoop/hive/llap/security/LlapSigner$Signable >> at java.base/java.lang.Class.getDeclaredConstructors0(Native Method) >> at >> java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3373) >> at java.base/java.lang.Class.getConstructor0(Class.java:3578) >> at java.base/java.lang.Class.getDeclaredConstructor(Class.java:2754) >> at >> org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:79) >> at >> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:208) >> at >> org.apache.hadoop.hive.ql.exec.Registry.registerGenericUDTF(Registry.java:201) >> at >> org.apache.hadoop.hive.ql.exec.FunctionRegistry.<clinit>(FunctionRegistry.java:500) >> at >> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:160) >> at >> org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector$lzycompute(hiveUDFEvaluators.scala:118) >> at >> org.apache.spark.sql.hive.HiveGenericUDFEvaluator.returnInspector(hiveUDFEvaluators.scala:117) >> at >> org.apache.spark.sql.hive.HiveGenericUDF.dataType$lzycompute(hiveUDFs.scala:132) >> at org.apache.spark.sql.hive.HiveGenericUDF.dataType(hiveUDFs.scala:132) >> at >> org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeHiveFunctionExpression(HiveSessionStateBuilder.scala:197) >> at >> org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.$anonfun$makeExpression$1(HiveSessionStateBuilder.scala:177) >> at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:187) >> at >> org.apache.spark.sql.hive.HiveUDFExpressionBuilder$.makeExpression(HiveSessionStateBuilder.scala:171) >> at >> org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$makeFunctionBuilder$1(SessionCatalog.scala:1689) >> … >> >> >> Currently (v4.0.0-rc2), user must add the hive-llap-common jar >> explicitly, e.g. by using >> --packages org.apache.hive:hive-llap-common:2.3.10, to fix the >> NoClassDefFoundError issue, even the my.HelloUDF >> does not depend on any class in hive-llap-common, this is quite confusing. >> >> [1] https://github.com/apache/spark/pull/49725 >> [2] https://spark.apache.org/docs/3.5.5/sql-ref-functions-udf-hive.html >> [3] >> https://github.com/apache/hive/blob/rel/release-2.3.10/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java#L208 >> >> Thanks, >> Cheng Pan >> >> >> >> On Mar 7, 2025, at 13:15, Wenchen Fan <cloud0...@gmail.com> wrote: >> >> RC2 fails and I'll cut RC3 next week. Thanks for the feedback! >> >> On Thu, Mar 6, 2025 at 6:44 AM Chris Nauroth <cnaur...@apache.org> wrote: >> >>> Here is one more problem I found during RC2 verification: >>> >>> https://github.com/apache/spark/pull/50173 >>> >>> This one is just a test issue. >>> >>> Chris Nauroth >>> >>> >>> On Tue, Mar 4, 2025 at 2:55 PM Jules Damji <jules.da...@gmail.com> >>> wrote: >>> >>>> - 1 (non-binding) >>>> >>>> A ran into number of installation and launching problems. May be it’s >>>> my enviornment, even though I removed any old binaries and packages. >>>> >>>> 1. Pip installing pyspark4.0.0 and pyspark-connect-4.0 from .tz file >>>> workedl, launching pyspark results into >>>> >>>> 25/03/04 14:00:26 ERROR SparkContext: Error initializing SparkContext. >>>> java.lang.ClassNotFoundException: >>>> org.apache.spark.sql.connect.SparkConnectPlugin >>>> >>>> 2. Similary installing the tar balls of either distribution and launch >>>> spark-shell goes into a loop and terminated by the shutdown hook. >>>> >>>> Thank you Wenchen for leading these release onerous manager efforts, >>>> but slowly we should be able to install and launch seamlessly. >>>> >>>> Keep up the good work & tireless effort for the Spark community! >>>> >>>> cheers >>>> Jules >>>> >>>> WARNING: Using incubator modules: jdk.incubator.vector >>>> 25/03/04 14:49:35 INFO BaseAllocator: Debug mode disabled. Enable with >>>> the VM option -Darrow.memory.debug.allocator=true. >>>> 25/03/04 14:49:35 INFO DefaultAllocationManagerOption: allocation >>>> manager type not specified, using netty as the default type >>>> 25/03/04 14:49:35 INFO CheckAllocator: Using DefaultAllocationManager >>>> at memory/netty/DefaultAllocationManagerFactory.class >>>> Using Spark's default log4j profile: >>>> org/apache/spark/log4j2-defaults.properties >>>> 25/03/04 14:49:35 WARN GrpcRetryHandler: Non-Fatal error during RPC >>>> execution: org.sparkproject.io.grpc.StatusRuntimeException: UNAVAILABLE: io >>>> exception, retrying (wait=50 ms, currentRetryNum=1, policy=DefaultPolicy). >>>> 25/03/04 14:49:35 WARN GrpcRetryHandler: Non-Fatal error during RPC >>>> execution: org.sparkproject.io.grpc.StatusRuntimeException: UNAVAILABLE: io >>>> exception, retrying (wait=200 ms, currentRetryNum=2, policy=DefaultPolicy). >>>> 25/03/04 14:49:35 WARN GrpcRetryHandler: Non-Fatal error during RPC >>>> execution: org.sparkproject.io.grpc.StatusRuntimeException: UNAVAILABLE: io >>>> exception, retrying (wait=800 ms, currentRetryNum=3, policy=DefaultPolicy). >>>> 25/03/04 14:49:36 WARN GrpcRetryHandler: Non-Fatal error during RPC >>>> execution: org.sparkproject.io.grpc.StatusRuntimeException: UNAVAILABLE: io >>>> exception, retrying (wait=3275 ms, currentRetryNum=4, >>>> policy=DefaultPolicy). >>>> 25/03/04 14:49:39 WARN GrpcRetryHandler: Non-Fatal error during RPC >>>> execution: org.sparkproject.io.grpc.StatusRuntimeException: UNAVAILABLE: io >>>> exception, retrying (wait=12995 ms, currentRetryNum=5, >>>> policy=DefaultPolicy). >>>> ^C25/03/04 14:49:40 INFO ShutdownHookManager: Shutdown hook called >>>> >>>> >>>> >>>> On Mar 4, 2025, at 2:24 PM, Chris Nauroth <cnaur...@apache.org> wrote: >>>> >>>> -1 (non-binding) >>>> >>>> I think I found some missing license information in the binary >>>> distribution. We may want to include this in the next RC: >>>> >>>> https://github.com/apache/spark/pull/50158 >>>> >>>> Thank you for putting together this RC, Wenchen. >>>> >>>> Chris Nauroth >>>> >>>> >>>> On Mon, Mar 3, 2025 at 6:10 AM Wenchen Fan <cloud0...@gmail.com> wrote: >>>> >>>>> Thanks for bringing up these blockers! I know RC2 isn’t fully ready >>>>> yet, but with over 70 commits since RC1, it’s time to have a new RC so >>>>> people can start testing the latest changes. Please continue testing and >>>>> keep the feedback coming! >>>>> >>>>> On Mon, Mar 3, 2025 at 6:06 PM beliefer <belie...@163.com> wrote: >>>>> >>>>>> -1 >>>>>> https://github.com/apache/spark/pull/50112 should be merged before >>>>>> release. >>>>>> >>>>>> >>>>>> At 2025-03-01 15:25:06, "Wenchen Fan" <cloud0...@gmail.com> wrote: >>>>>> >>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>> version 4.0.0. >>>>>> >>>>>> The vote is open until March 5 (PST) and passes if a majority +1 PMC >>>>>> votes are cast, with a minimum of 3 +1 votes. >>>>>> >>>>>> [ ] +1 Release this package as Apache Spark 4.0.0 >>>>>> [ ] -1 Do not release this package because ... >>>>>> >>>>>> To learn more about Apache Spark, please see >>>>>> https://spark.apache.org/ >>>>>> >>>>>> The tag to be voted on is v4.0.0-rc2 (commit >>>>>> 85188c07519ea809012db24421714bb75b45ab1b) >>>>>> https://github.com/apache/spark/tree/v4.0.0-rc2 >>>>>> >>>>>> The release files, including signatures, digests, etc. can be found >>>>>> at: >>>>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc2-bin/ >>>>>> >>>>>> Signatures used for Spark RCs can be found in this file: >>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>> >>>>>> The staging repository for this release can be found at: >>>>>> >>>>>> https://repository.apache.org/content/repositories/orgapachespark-1478/ >>>>>> >>>>>> The documentation corresponding to this release can be found at: >>>>>> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc2-docs/ >>>>>> >>>>>> The list of bug fixes going into 4.0.0 can be found at the following >>>>>> URL: >>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12353359 >>>>>> >>>>>> This release is using the release script of the tag v4.0.0-rc2. >>>>>> >>>>>> FAQ >>>>>> >>>>>> ========================= >>>>>> How can I help test this release? >>>>>> ========================= >>>>>> >>>>>> If you are a Spark user, you can help us test this release by taking >>>>>> an existing Spark workload and running on this release candidate, then >>>>>> reporting any regressions. >>>>>> >>>>>> If you're working in PySpark you can set up a virtual env and install >>>>>> the current RC and see if anything important breaks, in the Java/Scala >>>>>> you can add the staging repository to your projects resolvers and test >>>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>>> you don't end up building with a out of date RC going forward). >>>>>> >>>>>> >>>> >>