[PR] [SPARK-51136][HISTORYSERVER] Set CallerContext for History Server [spark]
cnauroth opened a new pull request, #49858: URL: https://github.com/apache/spark/pull/49858 ### What changes were proposed in this pull request? Initialize the Hadoop RPC `CallerContext` during History Server startup, before `FileSystem` access. Calls to HDFS will get tagged in the audit log as originating from the History Server. ### Why are the changes needed? Other Spark processes set the `CallerContext`, so that additional auditing context propagates in Hadoop RPC calls. This PR provides auditing context for calls from the History Server. Other callers provide additional information like app ID, attempt ID, etc. We don't provide that here through History Server, which serves multiple apps/attempts. ### Does this PR introduce _any_ user-facing change? Yes. In environments that configure `hadoop.caller.context.enabled=true`, users will now see additional information in the HDFS audit logs explicitly stating that calls originated from the History Server. ### How was this patch tested? A new unit test has been added. All tests pass in the history package. ``` build/mvn -pl core test -Dtest=none -DmembersOnlySuites=org.apache.spark.deploy.history ``` When the changes are deployed to a running cluster, the new caller context is visible in the HDFS audit logs. ``` 2025-02-07 23:00:54,657 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0012 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,683 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,699 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0011 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,715 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,729 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0010 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,743 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,755 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0009 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,767 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:00:54,779 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=open src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history/application_1738779819434_0008 dst=nullperm=null proto=rpc callerContext=SPARK_HISTORY 2025-02-07 23:01:04,160 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: allowed=true ugi=spark (auth:SIMPLE) ip=/10.240.5.205cmd=listStatus src=/133bcb94-52b8-4356-ad9b-7358c78ce7fd/spark-job-history dst=null perm=null proto=rpc callerContext=SPARK_HISTORY ``` ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to
Re: [PR] [SPARK-51136][HISTORYSERVER] Set CallerContext for History Server [spark]
cnauroth commented on code in PR #49858: URL: https://github.com/apache/spark/pull/49858#discussion_r1948244266 ## core/src/main/scala/org/apache/spark/util/Utils.scala: ## @@ -3151,9 +3152,31 @@ private[spark] object Utils } } -private[util] object CallerContext extends Logging { Review Comment: I needed to relax visibility on things here to facilitate unit testing with caller context enabled. LMK if a different approach is preferred (reflection?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51138][PYTHON][CONNECT][TESTS] Skip pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_area_plot [spark]
HyukjinKwon commented on PR #49859: URL: https://github.com/apache/spark/pull/49859#issuecomment-2646679978 cc @xinrong-meng mind taking a look please? I think the plot tests fail with dependencies with different versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51138][PYTHON][CONNECT][TESTS] Skip pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_area_plot [spark]
HyukjinKwon opened a new pull request, #49859: URL: https://github.com/apache/spark/pull/49859 ### What changes were proposed in this pull request? This PR proposes to skip pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_area_plot. ### Why are the changes needed? One failure here stops the Python build so it can't test others. Filed a JIRA to reenable https://issues.apache.org/jira/browse/SPARK-51137 ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Will monitor the build. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51138][PYTHON][CONNECT][TESTS] Skip pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_area_plot [spark]
HyukjinKwon commented on PR #49859: URL: https://github.com/apache/spark/pull/49859#issuecomment-2646680247 cc @zhengruifeng too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [TYPING] Add type overloads for inplace dataframe operations [spark]
github-actions[bot] commented on PR #48662: URL: https://github.com/apache/spark/pull/48662#issuecomment-2646681728 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49730][SQL] classify syntax errors for pgsql, mysql, sqlserver and h2 [spark]
github-actions[bot] commented on PR #48368: URL: https://github.com/apache/spark/pull/48368#issuecomment-2646681753 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-49827][SQL] Fetching all partitions from hive metastore in batches [spark]
github-actions[bot] commented on PR #48337: URL: https://github.com/apache/spark/pull/48337#issuecomment-2646681764 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50101][SQL] Fix collated behavior of StringToMap expression [spark]
github-actions[bot] closed pull request #48642: [SPARK-50101][SQL] Fix collated behavior of StringToMap expression URL: https://github.com/apache/spark/pull/48642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-35564][SQL] Support subexpression elimination for conditionally evaluated expressions [spark]
cloud-fan commented on code in PR #32987: URL: https://github.com/apache/spark/pull/32987#discussion_r1948345527 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -255,4 +304,26 @@ case class ExpressionEquals(e: Expression) { * Instead of appending to a mutable list/buffer of Expressions, just update the "flattened" * useCount in this wrapper in-place. */ -case class ExpressionStats(expr: Expression)(var useCount: Int) +case class ExpressionStats(expr: Expression)( +var useCount: Int = 1, +var conditionalUseCount: Int = 0) { + def getUseCount(): Int = if (useCount > 0) { +useCount + conditionalUseCount + } else { +0 + } +} + +/** + * A wrapper for the different types of children of expressions. `alwaysChildren` are child + * expressions that will always be evaluated and should be considered for subexpressions. + * `commonChildren` are children such that if there are any common expressions among them, those + * should be considered for subexpressions. `conditionalChildren` are children that are + * conditionally evaluated, such as in If, CaseWhen, or Coalesce expressions, and should only + * be considered for subexpressions if they are evaluated non-conditionally elsewhere. + */ +case class RecurseChildren( +alwaysChildren: Seq[Expression], +commonChildren: Seq[Seq[Expression]] = Nil, Review Comment: do we have an example this `commonChildren`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SQL][SPARK-51113] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]
cloud-fan commented on code in PR #49835: URL: https://github.com/apache/spark/pull/49835#discussion_r1948346938 ## sql/core/src/test/resources/sql-tests/inputs/view-correctness.sql: ## @@ -0,0 +1,421 @@ +-- This test suite checks the correctness of queries over views + +-- SPARK-51113, UNION + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num + UNION + SELECT 2 + UNION + SELECT 3 + UNION + SELECT 4 +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 + UNION + SELECT 2 + UNION + SELECT 3 + UNION + SELECT 4 +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 'a' AS a + UNION + SELECT 'b' + UNION + SELECT 'c' + UNION + SELECT 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 'a' + UNION + SELECT 'b' + UNION + SELECT 'c' + UNION + SELECT 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1, 'a' + UNION + SELECT 2, 'b' + UNION + SELECT 3, 'c' + UNION + SELECT 4, 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num, 'a' + UNION + SELECT 2, 'b' + UNION + SELECT 3, 'c' + UNION + SELECT 4, 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1, 'a' AS a + UNION + SELECT 2, 'b' + UNION + SELECT 3, 'c' + UNION + SELECT 4, 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num, 'a' AS a + UNION + SELECT 2, 'b' + UNION + SELECT 3, 'c' + UNION + SELECT 4, 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +-- SPARK-51113, UNION ALL + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num + UNION ALL + SELECT 2 + UNION ALL + SELECT 3 + UNION ALL + SELECT 4 +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 + UNION ALL + SELECT 2 + UNION ALL + SELECT 3 + UNION ALL + SELECT 4 +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 'a' AS a + UNION ALL + SELECT 'b' + UNION ALL + SELECT 'c' + UNION ALL + SELECT 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 'a' + UNION ALL + SELECT 'b' + UNION ALL + SELECT 'c' + UNION ALL + SELECT 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1, 'a' + UNION ALL + SELECT 2, 'b' + UNION ALL + SELECT 3, 'c' + UNION ALL + SELECT 4, 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num, 'a' + UNION ALL + SELECT 2, 'b' + UNION ALL + SELECT 3, 'c' + UNION ALL + SELECT 4, 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1, 'a' AS a + UNION ALL + SELECT 2, 'b' + UNION ALL + SELECT 3, 'c' + UNION ALL + SELECT 4, 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num, 'a' AS a + UNION ALL + SELECT 2, 'b' + UNION ALL + SELECT 3, 'c' + UNION ALL + SELECT 4, 'd' +; +SELECT * FROM v1; +DROP VIEW v1; + +-- SPARK-51113, EXCEPT + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num + EXCEPT + SELECT 2 + EXCEPT + SELECT 1 + EXCEPT + SELECT 2 +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 + EXCEPT + SELECT 2 + EXCEPT + SELECT 1 + EXCEPT + SELECT 2 +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 'a' AS a + EXCEPT + SELECT 'b' + EXCEPT + SELECT 'a' + EXCEPT + SELECT 'b' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 'a' + EXCEPT + SELECT 'b' + EXCEPT + SELECT 'a' + EXCEPT + SELECT 'b' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1, 'a' + EXCEPT + SELECT 2, 'b' + EXCEPT + SELECT 1, 'a' + EXCEPT + SELECT 2, 'b' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num, 'a' + EXCEPT + SELECT 2, 'b' + EXCEPT + SELECT 1, 'a' + EXCEPT + SELECT 2, 'b' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1, 'a' AS a + EXCEPT + SELECT 2, 'b' + EXCEPT + SELECT 1, 'a' + EXCEPT + SELECT 2, 'b' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num, 'a' AS a + EXCEPT + SELECT 2, 'b' + EXCEPT + SELECT 1, 'a' + EXCEPT + SELECT 2, 'b' +; +SELECT * FROM v1; +DROP VIEW v1; + +-- SPARK-51113, INTERSECT + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 AS num + INTERSECT + SELECT 1 + INTERSECT + SELECT 2 + INTERSECT + SELECT 2 +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 1 + INTERSECT + SELECT 1 + INTERSECT + SELECT 2 + INTERSECT + SELECT 2 +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 'a' AS a + INTERSECT + SELECT 'a' + INTERSECT + SELECT 'b' + INTERSECT + SELECT 'b' +; +SELECT * FROM v1; +DROP VIEW v1; + +CREATE TEMPORARY VIEW v1 AS + SELECT 'a' + INTERSECT + SELECT 'a' + INTERSECT + SELECT 'b' + INTERSECT + SELECT 'b' Review Comment: For the test coverage
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
LuciferYang commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2646829396 Just got back from vacation, I'll take a look tonight. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
wayneguow commented on code in PR #49854: URL: https://github.com/apache/spark/pull/49854#discussion_r1948351826 ## pom.xml: ## @@ -615,16 +615,17 @@ org.jvnet.staxex stax-ex - -jakarta.activation Review Comment: Thanks for pointing this out, I updated it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
wayneguow commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2646834860 > Should the change be mentioned in the migration guide? Otherwise LGTM Add a change description in `ml-migration-guide.md`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51139][ML][CONNECT] Refine error class `MLAttributeNotAllowedException` [spark]
zhengruifeng opened a new pull request, #49860: URL: https://github.com/apache/spark/pull/49860 ### What changes were proposed in this pull request? Refine error class `MLAttributeNotAllowedException` ### Why are the changes needed? this error message should contains the class name ### Does this PR introduce _any_ user-facing change? yes, error change ### How was this patch tested? updated tests ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
pan3793 commented on code in PR #49854: URL: https://github.com/apache/spark/pull/49854#discussion_r1948320823 ## pom.xml: ## @@ -599,7 +599,7 @@ org.glassfish.jaxb jaxb-runtime -2.3.2 +4.0.5 Review Comment: I mean `com.sun.xml.fastinfoset:FastInfoset` and `org.jvnet.staxex:stax-ex`, they are marked as optional in `org.glassfish.jaxb:jaxb-runtime:4.0.5` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
pan3793 commented on code in PR #49854: URL: https://github.com/apache/spark/pull/49854#discussion_r1948320823 ## pom.xml: ## @@ -599,7 +599,7 @@ org.glassfish.jaxb jaxb-runtime -2.3.2 +4.0.5 Review Comment: I mean `com.sun.xml.fastinfoset:FastInfoset` and `org.jvnet.staxex:stax-ex`, they are marked as optional in `org.glassfish.jaxb:jaxb-runtime:4.0.5` ## pom.xml: ## @@ -615,16 +615,17 @@ org.jvnet.staxex stax-ex - -jakarta.activation Review Comment: I mean `com.sun.xml.fastinfoset:FastInfoset` and `org.jvnet.staxex:stax-ex`, they are marked as optional in `org.glassfish.jaxb:jaxb-runtime:4.0.5` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50917][EXAMPLES] Add SparkConnectPi Scala example to work both for Connect and Classic [spark]
yaooqinn commented on PR #49617: URL: https://github.com/apache/spark/pull/49617#issuecomment-2646776731 Thank you @cloud-fan and @dongjoon-hyun, SparkDataFramePi sounds good to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
pan3793 commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2646775721 Should the change be mentioned in the migration guide? Otherwise LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]
cloud-fan commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1948322440 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -265,6 +267,19 @@ object Literal { s"Literal must have a corresponding value to ${dataType.catalogString}, " + s"but class ${Utils.getSimpleName(value.getClass)} found.") } + + def fromSQL(sql: String): Expression = { +CatalystSqlParser.parseExpression(sql).transformUp { Review Comment: Looking at the test failure, the missing piece is that `Literal#sql` can generate CAST which needs to be resolved to attach timezone. So we should add one more case match ``` case c: Cast if c.needsTimeZone => c.withTimeZone(conf.sessionLocalTimeZone) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51119][SQL] Readers on executors resolving EXISTS_DEFAULT should not call catalogs [spark]
cloud-fan commented on code in PR #49840: URL: https://github.com/apache/spark/pull/49840#discussion_r1948322440 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala: ## @@ -265,6 +267,19 @@ object Literal { s"Literal must have a corresponding value to ${dataType.catalogString}, " + s"but class ${Utils.getSimpleName(value.getClass)} found.") } + + def fromSQL(sql: String): Expression = { +CatalystSqlParser.parseExpression(sql).transformUp { Review Comment: Looking at the test failure, the missing piece is that `Literal#sql` can generate CAST which needs to be resolved to attach timezone. So we should add one more case match ``` case c: Cast if c.needsTimeZone => c.withTimeZone(SQLConf.get.sessionLocalTimeZone) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1948357356 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/v2/SetVariableExec.scala: ## @@ -17,26 +17,29 @@ package org.apache.spark.sql.execution.command.v2 +import java.util.Locale + import org.apache.spark.SparkException -import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.catalog.TempVariableManager +import org.apache.spark.sql.catalyst.{InternalRow, SqlScriptingLocalVariableManager} +import org.apache.spark.sql.catalyst.analysis.FakeLocalCatalog import org.apache.spark.sql.catalyst.expressions.{Attribute, Literal, VariableReference} import org.apache.spark.sql.catalyst.trees.UnaryLike +import org.apache.spark.sql.errors.QueryCompilationErrors.unresolvedVariableError import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.datasources.v2.V2CommandExec /** * Physical plan node for setting a variable. */ -case class SetVariableExec(variables: Seq[VariableReference], query: SparkPlan) - extends V2CommandExec with UnaryLike[SparkPlan] { +case class SetVariableExec( +variables: Seq[VariableReference], +query: SparkPlan) extends V2CommandExec with UnaryLike[SparkPlan] { Review Comment: unnecessary code style change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1948357547 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/v2/SetVariableExec.scala: ## @@ -47,21 +50,51 @@ case class SetVariableExec(variables: Seq[VariableReference], query: SparkPlan) val row = values(0) variables.zipWithIndex.foreach { case (v, index) => val value = row.get(index, v.dataType) -createVariable(variableManager, v, value) +setVariable(v, value) } } Seq.empty } - private def createVariable( - variableManager: TempVariableManager, + private def setVariable( variable: VariableReference, value: Any): Unit = { -variableManager.create( - variable.identifier.name, +val namePartsCaseAdjusted = if (session.sessionState.conf.caseSensitiveAnalysis) { + variable.originalNameParts +} else { + variable.originalNameParts.map(_.toLowerCase(Locale.ROOT)) +} + +val tempVariableManager = session.sessionState.catalogManager.tempVariableManager +val scriptingVariableManager = SqlScriptingLocalVariableManager.get() + +val variableManager = scriptingVariableManager + .filter(_ => variable.catalog == FakeLocalCatalog) + // If a local variable with nameParts exists, set it using scriptingVariableManager. +// .filter(_.get(namePartsCaseAdjusted).isDefined) Review Comment: why is this commented out? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51142][ML][CONNECT] ML protobufs clean up [spark]
zhengruifeng opened a new pull request, #49862: URL: https://github.com/apache/spark/pull/49862 ### What changes were proposed in this pull request? ML protobufs clean up ### Why are the changes needed? to follow the guide https://github.com/apache/spark/blob/ece14704cc083f17689d2e0b9ab8e31cf71a7a2d/sql/connect/docs/adding-proto-messages.md ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1948366905 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/v2/SetVariableExec.scala: ## @@ -47,21 +50,51 @@ case class SetVariableExec(variables: Seq[VariableReference], query: SparkPlan) val row = values(0) variables.zipWithIndex.foreach { case (v, index) => val value = row.get(index, v.dataType) -createVariable(variableManager, v, value) +setVariable(v, value) } } Seq.empty } - private def createVariable( - variableManager: TempVariableManager, + private def setVariable( variable: VariableReference, value: Any): Unit = { -variableManager.create( - variable.identifier.name, +val namePartsCaseAdjusted = if (session.sessionState.conf.caseSensitiveAnalysis) { + variable.originalNameParts +} else { + variable.originalNameParts.map(_.toLowerCase(Locale.ROOT)) +} + +val tempVariableManager = session.sessionState.catalogManager.tempVariableManager +val scriptingVariableManager = SqlScriptingLocalVariableManager.get() + +val variableManager = scriptingVariableManager + .filter(_ => variable.catalog == FakeLocalCatalog) + // If a local variable with nameParts exists, set it using scriptingVariableManager. +// .filter(_.get(namePartsCaseAdjusted).isDefined) Review Comment: Old code, deleted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
dusantism-db commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1948367347 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/v2/SetVariableExec.scala: ## @@ -17,26 +17,29 @@ package org.apache.spark.sql.execution.command.v2 +import java.util.Locale + import org.apache.spark.SparkException -import org.apache.spark.sql.catalyst.InternalRow -import org.apache.spark.sql.catalyst.catalog.TempVariableManager +import org.apache.spark.sql.catalyst.{InternalRow, SqlScriptingLocalVariableManager} +import org.apache.spark.sql.catalyst.analysis.FakeLocalCatalog import org.apache.spark.sql.catalyst.expressions.{Attribute, Literal, VariableReference} import org.apache.spark.sql.catalyst.trees.UnaryLike +import org.apache.spark.sql.errors.QueryCompilationErrors.unresolvedVariableError import org.apache.spark.sql.execution.SparkPlan import org.apache.spark.sql.execution.datasources.v2.V2CommandExec /** * Physical plan node for setting a variable. */ -case class SetVariableExec(variables: Seq[VariableReference], query: SparkPlan) - extends V2CommandExec with UnaryLike[SparkPlan] { +case class SetVariableExec( +variables: Seq[VariableReference], +query: SparkPlan) extends V2CommandExec with UnaryLike[SparkPlan] { Review Comment: fixed ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -266,22 +275,42 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase { } } -if (maybeTempVariableName(nameParts)) { - val variableName = if (conf.caseSensitiveAnalysis) { -nameParts.last - } else { -nameParts.last.toLowerCase(Locale.ROOT) - } - catalogManager.tempVariableManager.get(variableName).map { varDef => +val namePartsCaseAdjusted = if (conf.caseSensitiveAnalysis) { + nameParts +} else { + nameParts.map(_.toLowerCase(Locale.ROOT)) +} + +SqlScriptingLocalVariableManager.get() + // If we are in EXECUTE IMMEDIATE lookup only session variables. + .filterNot(_ => AnalysisContext.get.isExecuteImmediate) + // If variable name is qualified with system.session. or session. Review Comment: applied -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51142][ML][CONNECT] ML protobufs clean up [spark]
zhengruifeng commented on code in PR #49862: URL: https://github.com/apache/spark/pull/49862#discussion_r1948375370 ## sql/connect/common/src/main/protobuf/spark/connect/ml_common.proto: ## @@ -33,24 +33,32 @@ message MlParams { // MLOperator represents the ML operators like (Estimator, Transformer or Evaluator) message MlOperator { - // The qualified name of the ML operator. + // (Required) The qualified name of the ML operator. string name = 1; - // Unique id of the ML operator + + // (Required) Unique id of the ML operator string uid = 2; - // Represents what the ML operator is + + // (Required) Represents what the ML operator is OperatorType type = 3; + enum OperatorType { -UNSPECIFIED = 0; -ESTIMATOR = 1; -TRANSFORMER = 2; -EVALUATOR = 3; -MODEL = 4; +OPERATOR_TYPE_UNSPECIFIED = 0; Review Comment: rename to follow the protobuf style guide https://protobuf.dev/programming-guides/style/#enums -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51142][ML][CONNECT] ML protobufs clean up [spark]
zhengruifeng commented on code in PR #49862: URL: https://github.com/apache/spark/pull/49862#discussion_r1948375370 ## sql/connect/common/src/main/protobuf/spark/connect/ml_common.proto: ## @@ -33,24 +33,32 @@ message MlParams { // MLOperator represents the ML operators like (Estimator, Transformer or Evaluator) message MlOperator { - // The qualified name of the ML operator. + // (Required) The qualified name of the ML operator. string name = 1; - // Unique id of the ML operator + + // (Required) Unique id of the ML operator string uid = 2; - // Represents what the ML operator is + + // (Required) Represents what the ML operator is OperatorType type = 3; + enum OperatorType { -UNSPECIFIED = 0; -ESTIMATOR = 1; -TRANSFORMER = 2; -EVALUATOR = 3; -MODEL = 4; +OPERATOR_TYPE_UNSPECIFIED = 0; Review Comment: rename to follow the protobuf style guide > Prefix every value with the enum name (converted to UPPER_SNAKE_CASE) https://protobuf.dev/programming-guides/style/#enums -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51142][ML][CONNECT] ML protobufs clean up [spark]
zhengruifeng commented on PR #49862: URL: https://github.com/apache/spark/pull/49862#issuecomment-2646882798 cc @grundprinzip and @wbo4958 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1948356321 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala: ## @@ -73,28 +95,49 @@ class ResolveCatalogs(val catalogManager: CatalogManager) } } - private def resolveVariableName(nameParts: Seq[String]): ResolvedIdentifier = { -def ident: Identifier = Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), nameParts.last) -if (nameParts.length == 1) { - ResolvedIdentifier(FakeSystemCatalog, ident) -} else if (nameParts.length == 2) { - if (nameParts.head.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) { -ResolvedIdentifier(FakeSystemCatalog, ident) - } else { -throw QueryCompilationErrors.unresolvedVariableError( - nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE)) - } -} else if (nameParts.length == 3) { - if (nameParts(0).equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) && -nameParts(1).equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) { -ResolvedIdentifier(FakeSystemCatalog, ident) - } else { -throw QueryCompilationErrors.unresolvedVariableError( - nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE)) - } -} else { + private def resolveCreateVariableName(nameParts: Seq[String]): ResolvedIdentifier = { +val resolvedIdentifier = SqlScriptingLocalVariableManager.get() + .filterNot(_ => AnalysisContext.get.isExecuteImmediate) + .getOrElse(catalogManager.tempVariableManager) + .resolveIdentifier(nameParts.last) Review Comment: what's wrong with my proposal https://github.com/apache/spark/pull/49445/files#r1946867299 ? ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala: ## @@ -73,28 +95,49 @@ class ResolveCatalogs(val catalogManager: CatalogManager) } } - private def resolveVariableName(nameParts: Seq[String]): ResolvedIdentifier = { -def ident: Identifier = Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), nameParts.last) -if (nameParts.length == 1) { - ResolvedIdentifier(FakeSystemCatalog, ident) -} else if (nameParts.length == 2) { - if (nameParts.head.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) { -ResolvedIdentifier(FakeSystemCatalog, ident) - } else { -throw QueryCompilationErrors.unresolvedVariableError( - nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE)) - } -} else if (nameParts.length == 3) { - if (nameParts(0).equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) && -nameParts(1).equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) { -ResolvedIdentifier(FakeSystemCatalog, ident) - } else { -throw QueryCompilationErrors.unresolvedVariableError( - nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE)) - } -} else { + private def resolveCreateVariableName(nameParts: Seq[String]): ResolvedIdentifier = { +val resolvedIdentifier = SqlScriptingLocalVariableManager.get() + .filterNot(_ => AnalysisContext.get.isExecuteImmediate) + .getOrElse(catalogManager.tempVariableManager) + .resolveIdentifier(nameParts.last) Review Comment: what's wrong with my proposal https://github.com/apache/spark/pull/49445/files#r1946867299 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1948356233 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala: ## @@ -73,28 +95,49 @@ class ResolveCatalogs(val catalogManager: CatalogManager) } } - private def resolveVariableName(nameParts: Seq[String]): ResolvedIdentifier = { -def ident: Identifier = Identifier.of(Array(CatalogManager.SESSION_NAMESPACE), nameParts.last) -if (nameParts.length == 1) { - ResolvedIdentifier(FakeSystemCatalog, ident) -} else if (nameParts.length == 2) { - if (nameParts.head.equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) { -ResolvedIdentifier(FakeSystemCatalog, ident) - } else { -throw QueryCompilationErrors.unresolvedVariableError( - nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE)) - } -} else if (nameParts.length == 3) { - if (nameParts(0).equalsIgnoreCase(CatalogManager.SYSTEM_CATALOG_NAME) && -nameParts(1).equalsIgnoreCase(CatalogManager.SESSION_NAMESPACE)) { -ResolvedIdentifier(FakeSystemCatalog, ident) - } else { -throw QueryCompilationErrors.unresolvedVariableError( - nameParts, Seq(CatalogManager.SYSTEM_CATALOG_NAME, CatalogManager.SESSION_NAMESPACE)) - } -} else { + private def resolveCreateVariableName(nameParts: Seq[String]): ResolvedIdentifier = { +val resolvedIdentifier = SqlScriptingLocalVariableManager.get() + .filterNot(_ => AnalysisContext.get.isExecuteImmediate) + .getOrElse(catalogManager.tempVariableManager) + .resolveIdentifier(nameParts.last) Review Comment: we should either use `assert` to call out all the assumptions or make the code general. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1948357083 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/VariableManager.scala: ## @@ -0,0 +1,159 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.catalog + +import javax.annotation.concurrent.GuardedBy + +import scala.collection.mutable + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.expressions.Literal +import org.apache.spark.sql.connector.catalog.{CatalogManager, Identifier} +import org.apache.spark.sql.connector.catalog.CatalogManager.{SESSION_NAMESPACE, SYSTEM_CATALOG_NAME} +import org.apache.spark.sql.errors.DataTypeErrorsBase +import org.apache.spark.sql.errors.QueryCompilationErrors.unresolvedVariableError + +/** + * Trait which provides an interface for variable managers. Methods are case sensitive regarding + * the variable name/nameParts/identifier, callers are responsible to + * format them w.r.t. case-sensitive config. + */ +trait VariableManager { + /** + * Create a variable. + * @param identifier Identifier of the variable. + * @param defaultValueSQL SQL text of the variable's DEFAULT expression. + * @param initValue Initial value of the variable. + * @param overrideIfExists If true, the new variable will replace an existing one + * with the same identifier, if it exists. + */ + def create( + identifier: Identifier, + defaultValueSQL: String, + initValue: Literal, + overrideIfExists: Boolean): Unit + + /** + * Set an existing variable to a new value. + * + * @param nameParts Name parts of the variable. + * @param defaultValueSQL SQL text of the variable's DEFAULT expression. + * @param initValue Initial value of the variable. + * @param identifier Identifier of the variable. + */ + def set( + nameParts: Seq[String], Review Comment: why do we keep both Identifier and Seq[String]? what's the point of this duplication? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51133][BUILD] Upgrade Apache `commons-pool2` to 2.12.1 [spark]
LuciferYang commented on PR #49856: URL: https://github.com/apache/spark/pull/49856#issuecomment-2646847348 Merged into master. Thanks @wayneguow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48530][SQL] Support for local variables in SQL Scripting [spark]
cloud-fan commented on code in PR #49445: URL: https://github.com/apache/spark/pull/49445#discussion_r1948354168 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala: ## @@ -266,22 +275,42 @@ trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase { } } -if (maybeTempVariableName(nameParts)) { - val variableName = if (conf.caseSensitiveAnalysis) { -nameParts.last - } else { -nameParts.last.toLowerCase(Locale.ROOT) - } - catalogManager.tempVariableManager.get(variableName).map { varDef => +val namePartsCaseAdjusted = if (conf.caseSensitiveAnalysis) { + nameParts +} else { + nameParts.map(_.toLowerCase(Locale.ROOT)) +} + +SqlScriptingLocalVariableManager.get() + // If we are in EXECUTE IMMEDIATE lookup only session variables. + .filterNot(_ => AnalysisContext.get.isExecuteImmediate) + // If variable name is qualified with system.session. or session. Review Comment: ```suggestion // If variable name is qualified with session. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51133][BUILD] Upgrade Apache `commons-pool2` to 2.12.1 [spark]
LuciferYang closed pull request #49856: [SPARK-51133][BUILD] Upgrade Apache `commons-pool2` to 2.12.1 URL: https://github.com/apache/spark/pull/49856 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51138][PYTHON][CONNECT][TESTS] Skip pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_area_plot [spark]
zhengruifeng commented on PR #49859: URL: https://github.com/apache/spark/pull/49859#issuecomment-2646858360 This failure also happens in https://github.com/apache/spark/actions/workflows/build_python_3.11_macos.yml after check the history, I think it is due to `plotly` upgrade @LuciferYang and @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51143][PYTHON] Pin `plotly==5.24.1` [spark]
zhengruifeng opened a new pull request, #49863: URL: https://github.com/apache/spark/pull/49863 ### What changes were proposed in this pull request? Pin `plotly==5.24.1` ### Why are the changes needed? the latest plotlly 6.0 has causes many plot-related test failures ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? manually checked with ``` python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot' ``` before: ``` (spark_312) ➜ spark git:(pin_plotly) python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the following Python tests: ['pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot'] python3 python_implementation is CPython python3 version is: Python 3.12.9 Starting test(python3): pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/4d10075d-bb7b-4d4b-b17d-edbef2f7/python3__pyspark.sql.tests.connect.test_parity_frame_plot_plotly_FramePlotPlotlyParityTests.test_pie_plot__6qxzu16x.log) Running tests... -- WARNING: Using incubator modules: jdk.incubator.vector Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). /Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.execute.reattachable.senderMaxStreamDuration to Some(1s) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamDuration". See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110 warnings.warn(warn) /Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.execute.reattachable.senderMaxStreamSize to Some(123) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamSize". See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110 warnings.warn(warn) test_pie_plot (pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_pie_plot) ... FAIL (1.760s) == FAIL [1.760s]: test_pie_plot (pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_pie_plot) -- Traceback (most recent call last): File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py", line 318, in test_pie_plot self._check_fig_data(fig["data"][0], **expected_fig_data_sales) File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py", line 81, in _check_fig_data self.assertEqual(converted_values, expected_value) AssertionError: Lists differ: [15173568000, 1519776,[37 chars]] != [datetime.datetime(2018, 1, 31, 0, 0), dat[105 chars], 0)] First differing element 0: datetime.datetime(2018, 1, 31, 0, 0) - [15173568000, - 1519776, - 15224544000, - 15250464000] + [datetime.datetime(2018, 1, 31, 0, 0), + datetime.datetime(2018, 2, 28, 0, 0), + datetime.datetime(2018, 3, 31, 0, 0), + datetime.datetime(2018, 4, 30, 0, 0)] -- Ran 1 test in 5.573s FAILED (failures=1) Generating XML reports... Generated XML report: target/test-reports/TEST-pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests-20250210120410.xml Had test failures in pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot with python3; see logs. ``` after: ``` (spark_312) ➜ spark git:(pin_plotly) python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot' Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log Will test against the following Python executables: ['python3'] Will test the followi
Re: [PR] [SPARK-51143][PYTHON] Pin `plotly==5.24.1` [spark]
zhengruifeng commented on PR #49863: URL: https://github.com/apache/spark/pull/49863#issuecomment-2646873479 also cc @cloud-fan we probably need to pin `plotly` in the release docker image -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51139][ML][CONNECT] Refine error class `MLAttributeNotAllowedException` [spark]
zhengruifeng closed pull request #49860: [SPARK-51139][ML][CONNECT] Refine error class `MLAttributeNotAllowedException` URL: https://github.com/apache/spark/pull/49860 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51139][ML][CONNECT] Refine error class `MLAttributeNotAllowedException` [spark]
zhengruifeng commented on PR #49860: URL: https://github.com/apache/spark/pull/49860#issuecomment-2646875702 thanks, merged to master/4.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51143][PYTHON] Pin `plotly==5.24.1` [spark]
zhengruifeng commented on PR #49863: URL: https://github.com/apache/spark/pull/49863#issuecomment-2647094992 update in the docker file trigger the refresh of the cache, and `torch` is also upgraded and caused ``` == ERROR [42.116s]: test_save_load (pyspark.ml.tests.connect.test_connect_classification.ClassificationTestsOnConnect.test_save_load) -- Traceback (most recent call last): File "/__w/spark/spark/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py", line 185, in test_save_load lor_torch_model = torch.load( ^^^ File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load raise pickle.UnpicklingError(_get_wo_message(str(e))) from None _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. WeightsUnpickler error: Unsupported global: GLOBAL torch.nn.modules.container.Sequential was not an allowed global by default. Please use `torch.serialization.add_safe_globals([Sequential])` or the `torch.serialization.safe_globals([Sequential])` context manager to allowlist this global if you trust this class/function. Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
vruusmann commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2646143491 A quick memory dump on JPMML-Model evolution: - `1.5.X`. Migrating from PMML schema 4.3 to 4.4. JDK 8 compatible. - `1.6.X`. API upgrades. Specifically, replacing `org.dmg.pmml.FieldName` (pseudo-enum for representing field names) with plain `java.lang.String`. Also, migrating from `javax.xml.bind.*` API to `jakarta.xml.bind.*` API. JDK 8 compatible. - `1.7.X`. Upgrading from JDK 8 to JDK 11. The codebase is effectively unchanged. Just picking up up-to-date dependencies. The `org.glassfish.jaxb:jaxb-*` dependencies are chosen based on their JDK compaibility. For example, `org.glassfish.jaxb:jaxb-runtime:3.X` versions are targeting JDK 8, whereas `org.glassfish.jaxb:jaxb-runtime:4.X` are targeting JDK 11. Apache Spark 4.X is targeting JDK 17, no? As for encoding PMML schema versions and XML namespaces, then there's a `org.dmg.pmml.Version` enum class for this purpose. You could define a class constant `SPARK_PMML_VERSION` somewhere, and then access its `#version` and `#namespaceURI` string attributes: ```java org.dmg.pmml.Version SPARK_PMML_VERSION = org.dmg.pmml.Version.PMML_4_4; System.out.println(SPARK_PMML_VERSION .getVersion()); // Prints "4.4" System.out.println(SPARK_PMML_VERSION.getNamespaceURI()); // Prints "http://www.dmg.org/PMML-4_4"; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
wayneguow commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2646149933 > Apache Spark 4.X is targeting JDK 17, no? Yes, Spark 4.x is targeting JDK 17. https://github.com/apache/spark/blob/301b666a1fcbd4c59d96c53fe3a547ea1512f397/pom.xml#L117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
vruusmann commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2646164469 > Yes, Spark 4.x is targeting JDK 17. Perhaps it is then possible to be even more aggressive on some transitive dependency updates (than JPMML-Model 1.7.1 proposes). My goal is to maintain source compatibility with legacy platforms. One of new goals for the `1.7.X` development branches is to achieve cross-compilability with JavaScript/WASM, as implemented by the TeaVM platform. TLDR: The JPMML-Model dependency upgrade is highly unlikely to break anything. It's targeting JDK 11, whereas you are already targeting JDK 17. My unit and integration tests indicate that JPMML-Model runs fine on all JDK 11+ versions (specifically, 11, 17 and 21 LTS versions). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SQL][SPARK-51113] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]
srielau commented on PR #49835: URL: https://github.com/apache/spark/pull/49835#issuecomment-2646656062 > Here's the commit that enabled `SELECT 1 UNION SELECT 2` syntax in SQL parser: #40835. This commit enabled view creation with such SQL text (`parsePlan` is used for `CREATE VIEW`). > > However, even before this commit `parseQuery` parsed `UNION` in `SELECT 1 UNION SELECT 2` as an alias, it's just we could not create such a view. I still cannot believe that this is limited to UNION. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SQL][SPARK-51113] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]
srielau commented on PR #49835: URL: https://github.com/apache/spark/pull/49835#issuecomment-2646655795 > `spark.conf.set("spark.sql.ansi.enforceReservedKeywords", "true")` mitigates the problem, since UNION becomes a reserved keyword. True, a rather hefty mitigation, though. Lots of collateral damage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51136][HISTORYSERVER] Set CallerContext for History Server [spark]
cnauroth commented on PR #49858: URL: https://github.com/apache/spark/pull/49858#issuecomment-2646610534 If approved, can this also go into branch-3.5 please? The cherry-pick would need a minor merge conflict resolution in `FsHistoryProvider` import statements, or I can send a separate pull request. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51111][DSTREAM] Avoid consumer rebalancing stuck when starting a spark streaming job [spark]
yabola commented on PR #49831: URL: https://github.com/apache/spark/pull/49831#issuecomment-2646800578 @tdas Hi~ could you help review this, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51008][SQL] Add ResultStage for AQE [spark]
cloud-fan commented on code in PR #49715: URL: https://github.com/apache/spark/pull/49715#discussion_r1948335832 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -344,56 +346,53 @@ case class AdaptiveSparkPlanExec( if (errors.nonEmpty) { cleanUpAndThrowException(errors.toSeq, None) } - -// Try re-optimizing and re-planning. Adopt the new plan if its cost is equal to or less -// than that of the current plan; otherwise keep the current physical plan together with -// the current logical plan since the physical plan's logical links point to the logical -// plan it has originated from. -// Meanwhile, we keep a list of the query stages that have been created since last plan -// update, which stands for the "semantic gap" between the current logical and physical -// plans. And each time before re-planning, we replace the corresponding nodes in the -// current logical plan with logical query stages to make it semantically in sync with -// the current physical plan. Once a new plan is adopted and both logical and physical -// plans are updated, we can clear the query stage list because at this point the two plans -// are semantically and physically in sync again. -val logicalPlan = replaceWithQueryStagesInLogicalPlan(currentLogicalPlan, stagesToReplace) -val afterReOptimize = reOptimize(logicalPlan) -if (afterReOptimize.isDefined) { - val (newPhysicalPlan, newLogicalPlan) = afterReOptimize.get - val origCost = costEvaluator.evaluateCost(currentPhysicalPlan) - val newCost = costEvaluator.evaluateCost(newPhysicalPlan) - if (newCost < origCost || -(newCost == origCost && currentPhysicalPlan != newPhysicalPlan)) { -lazy val plans = - sideBySide(currentPhysicalPlan.treeString, newPhysicalPlan.treeString).mkString("\n") -logOnLevel(log"Plan changed:\n${MDC(QUERY_PLAN, plans)}") -cleanUpTempTags(newPhysicalPlan) -currentPhysicalPlan = newPhysicalPlan -currentLogicalPlan = newLogicalPlan -stagesToReplace = Seq.empty[QueryStageExec] +if (!currentPhysicalPlan.isInstanceOf[ResultQueryStageExec]) { + // Try re-optimizing and re-planning. Adopt the new plan if its cost is equal to or less + // than that of the current plan; otherwise keep the current physical plan together with + // the current logical plan since the physical plan's logical links point to the logical + // plan it has originated from. + // Meanwhile, we keep a list of the query stages that have been created since last plan + // update, which stands for the "semantic gap" between the current logical and physical + // plans. And each time before re-planning, we replace the corresponding nodes in the + // current logical plan with logical query stages to make it semantically in sync with + // the current physical plan. Once a new plan is adopted and both logical and physical + // plans are updated, we can clear the query stage list because at this point the two + // plans are semantically and physically in sync again. + val logicalPlan = replaceWithQueryStagesInLogicalPlan(currentLogicalPlan, stagesToReplace) + val afterReOptimize = reOptimize(logicalPlan) + if (afterReOptimize.isDefined) { +val (newPhysicalPlan, newLogicalPlan) = afterReOptimize.get +val origCost = costEvaluator.evaluateCost(currentPhysicalPlan) +val newCost = costEvaluator.evaluateCost(newPhysicalPlan) +if (newCost < origCost || + (newCost == origCost && currentPhysicalPlan != newPhysicalPlan)) { + lazy val plans = sideBySide( +currentPhysicalPlan.treeString, newPhysicalPlan.treeString).mkString("\n") + logOnLevel(log"Plan changed:\n${MDC(QUERY_PLAN, plans)}") + cleanUpTempTags(newPhysicalPlan) + currentPhysicalPlan = newPhysicalPlan + currentLogicalPlan = newLogicalPlan + stagesToReplace = Seq.empty[QueryStageExec] +} } } // Now that some stages have finished, we can try creating new stages. -result = createQueryStages(currentPhysicalPlan) +result = createQueryStages(fun, currentPhysicalPlan, firstRun = false) } - - // Run the final plan when there's no more unfinished stages. - currentPhysicalPlan = applyPhysicalRules( -optimizeQueryStage(result.newPlan, isFinalStage = true), -postStageCreationRules(supportsColumnar), -Some((planChangeLogger, "AQE Post Stage Creation"))) - _isFinalPlan = true - executionId.foreach(onUpdatePlan(_, Seq(curren
Re: [PR] [SPARK-48239][INFRA][FOLLOWUP] Update the release docker image to follow what we use in Github Action jobs [spark]
cloud-fan commented on PR #49851: URL: https://github.com/apache/spark/pull/49851#issuecomment-2646810575 The streaming test failure is unrelated, thanks for the review, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-48239][INFRA][FOLLOWUP] Update the release docker image to follow what we use in Github Action jobs [spark]
cloud-fan closed pull request #49851: [SPARK-48239][INFRA][FOLLOWUP] Update the release docker image to follow what we use in Github Action jobs URL: https://github.com/apache/spark/pull/49851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51109][SQL] CTE in subquery expression as grouping column [spark]
cloud-fan commented on PR #49829: URL: https://github.com/apache/spark/pull/49829#issuecomment-2646813304 thanks for the review! merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50953][PYTHON][CONNECT] Add support for non-literal paths in VariantGet [spark]
cloud-fan commented on PR #49609: URL: https://github.com/apache/spark/pull/49609#issuecomment-2646812807 The code doesn't compile... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51109][SQL] CTE in subquery expression as grouping column [spark]
cloud-fan closed pull request #49829: [SPARK-51109][SQL] CTE in subquery expression as grouping column URL: https://github.com/apache/spark/pull/49829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-35564][SQL] Support subexpression elimination for conditionally evaluated expressions [spark]
cloud-fan commented on code in PR #32987: URL: https://github.com/apache/spark/pull/32987#discussion_r1948343729 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -79,7 +86,12 @@ class EquivalentExpressions( } case _ => if (useCount > 0) { -map.put(wrapper, ExpressionStats(expr)(useCount)) +val stats = if (conditional) { + ExpressionStats(expr)(0, useCount) Review Comment: ```suggestion ExpressionStats(expr)(useCount = 0, conditionalUseCount = useCount) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-35564][SQL] Support subexpression elimination for conditionally evaluated expressions [spark]
cloud-fan commented on code in PR #32987: URL: https://github.com/apache/spark/pull/32987#discussion_r1948343729 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -79,7 +86,12 @@ class EquivalentExpressions( } case _ => if (useCount > 0) { -map.put(wrapper, ExpressionStats(expr)(useCount)) +val stats = if (conditional) { + ExpressionStats(expr)(0, useCount) Review Comment: ```suggestion ExpressionStats(expr)(useCount = 0, useCount) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-35564][SQL] Support subexpression elimination for conditionally evaluated expressions [spark]
cloud-fan commented on code in PR #32987: URL: https://github.com/apache/spark/pull/32987#discussion_r1948344607 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala: ## @@ -99,34 +111,37 @@ class EquivalentExpressions( * only the common nodes. * Those common nodes are then removed from the local map and added to the final map of * expressions. + * + * Conditional expressions are not considered because we are simply looking for expressions + * evaluated once in each parent expression. */ - private def updateCommonExprs( - exprs: Seq[Expression], - map: mutable.HashMap[ExpressionEquals, ExpressionStats], Review Comment: shall we update the doc of this method? no equivalenceMap in this method now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50953][PYTHON][CONNECT] Add support for non-literal paths in VariantGet [spark]
harshmotw-db commented on PR #49609: URL: https://github.com/apache/spark/pull/49609#issuecomment-2646925700 @cloud-fan Thanks, I think we're good now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SQL][SPARK-51113] Fix correctness with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE [spark]
vladimirg-db commented on PR #49835: URL: https://github.com/apache/spark/pull/49835#issuecomment-2646625494 Here's the commit that enabled `SELECT 1 UNION SELECT 2` syntax in SQL parser: https://github.com/apache/spark/pull/40835. This commit enabled view creation with such SQL text (`parsePlan` is used for `CREATE VIEW`). However, even before this commit `parseQuery` parsed `UNION` in `SELECT 1 UNION SELECT 2` as an alias, it's just we could not create such a view. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51140][ML] Sort the params before saving [spark]
zhengruifeng opened a new pull request, #49861: URL: https://github.com/apache/spark/pull/49861 ### What changes were proposed in this pull request? Sort the params before saving ### Why are the changes needed? to improve debugability: when developing ml connect, sometime I need to manually check the stored models, I notice that the params are saved unsorted, so is hard to compare the params: before: ``` {"class":"org.apache.spark.ml.clustering.PowerIterationClustering","timestamp":1738926090947,"sparkVersion":"4.1.0-SNAPSHOT","uid":"PowerIterationClustering_a5c66eecaec6","paramMap":{"dstCol":"dst","k":2,"weightCol":"weight","maxIter":40,"srcCol":"src","initMode":"random"},"defaultParamMap":{"dstCol":"dst","k":2,"maxIter":20,"srcCol":"src","initMode":"random"}} ``` ``` {"class":"org.apache.spark.ml.clustering.PowerIterationClustering","timestamp":1738926386839,"sparkVersion":"4.1.0-SNAPSHOT","uid":"PowerIterationClustering_b91c5734d913","paramMap":{"k":2,"initMode":"random","weightCol":"weight","srcCol":"src","maxIter":40,"dstCol":"dst"},"defaultParamMap":{"k":2,"initMode":"random","srcCol":"src","maxIter":20,"dstCol":"dst"}} ``` after: ``` {"class":"org.apache.spark.ml.clustering.PowerIterationClustering","timestamp":1738926422941,"sparkVersion":"4.1.0-SNAPSHOT","uid":"PowerIterationClustering_96355d38f450","paramMap":{"initMode":"random","maxIter":20,"srcCol":"src","dstCol":"dst","k":2},"defaultParamMap":{"initMode":"random","maxIter":20,"srcCol":"src","dstCol":"dst","k":2}} ``` ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests and manually check ### Was this patch authored or co-authored using generative AI tooling? no -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50917][EXAMPLES] Add Pi Scala example to work both for Connect and Classic [spark]
cloud-fan commented on code in PR #49617: URL: https://github.com/apache/spark/pull/49617#discussion_r1948330890 ## examples/src/main/scala/org/apache/spark/examples/sql/SparkDataFramePi.scala: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +// scalastyle:off println +package org.apache.spark.examples.sql + +import org.apache.spark.sql.SparkSession +import org.apache.spark.sql.functions._ + +/** Computes an approximation to pi with SparkSession/DataFrame APIs */ +object SparkDataFramePi { + def main(args: Array[String]): Unit = { +val spark = SparkSession + .builder() + .appName("Spark Connect Pi") Review Comment: ```suggestion .appName("Spark DataFrame Pi") ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1948333282 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala: ## @@ -18,15 +18,15 @@ package org.apache.spark.sql.catalyst.analysis import org.apache.spark.sql.catalyst.expressions.{Cast, Expression, Literal} -import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, AlterColumnSpec, AlterViewAs, ColumnDefinition, CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan} +import org.apache.spark.sql.catalyst.plans.logical.{AddColumns, AlterColumns, AlterColumnSpec, AlterTableCommand, AlterViewAs, ColumnDefinition, CreateTable, CreateView, LogicalPlan, QualifiedColType, ReplaceColumns, V1CreateTablePlan, V2CreateTablePlan} import org.apache.spark.sql.catalyst.rules.{Rule, RuleExecutor} +import org.apache.spark.sql.connector.catalog.TableCatalog import org.apache.spark.sql.types.{DataType, StringType} /** * Resolves default string types in queries and commands. For queries, the default string type is - * determined by the session's default string type. For DDL, the default string type is the - * default type of the object (table -> schema -> catalog). However, this is not implemented yet. - * So, we will just use UTF8_BINARY for now. + * determined by the default string type, which is UTF8_BINARY. For DDL, the default string type Review Comment: This is confusing. For queries, the default string type is handled by the parser: when collation is not specified, we use `object StringType`. What does this rule need to do? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51135][SQL] Fix ViewResolverSuite for ANSI modes [spark]
cloud-fan commented on PR #49857: URL: https://github.com/apache/spark/pull/49857#issuecomment-2646800076 thanks, merging to master/4.0! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51067][SQL] Revert session level collation for DML queries and apply object level collation for DDL queries [spark]
cloud-fan commented on code in PR #49772: URL: https://github.com/apache/spark/pull/49772#discussion_r1948333844 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultStringTypes.scala: ## @@ -79,18 +79,37 @@ object ResolveDefaultStringTypes extends Rule[LogicalPlan] { expression.exists(e => transformExpression.isDefinedAt(e)) } - private def isDefaultSessionCollationUsed: Boolean = conf.defaultStringType == StringType + /** Default string type is UTF8_BINARY */ + private def defaultStringType: StringType = StringType("UTF8_BINARY") - /** - * Returns the default string type that should be used in a given DDL command (for now always - * UTF8_BINARY). - */ - private def stringTypeForDDLCommand(table: LogicalPlan): StringType = -StringType("UTF8_BINARY") + /** Returns the default string type that should be used in a given DDL command */ + private def stringTypeForDDLCommand(table: LogicalPlan): StringType = { +if (table.isInstanceOf[CreateTable]) { + if (table.asInstanceOf[CreateTable].tableSpec.collation.isDefined) { +return StringType(table.asInstanceOf[CreateTable].tableSpec.collation.get) + } +} +else if (table.isInstanceOf[CreateView]) { + if (table.asInstanceOf[CreateView].collation.isDefined) { +return StringType(table.asInstanceOf[CreateView].collation.get) + } +} +else if (table.isInstanceOf[AlterTableCommand]) { + if (table.asInstanceOf[AlterTableCommand].table.resolved) { +val collation = Option(table.asInstanceOf[AlterTableCommand] + .table.asInstanceOf[ResolvedTable] + .table.properties.get(TableCatalog.PROP_COLLATION)) +if (collation.isDefined) { + return StringType(collation.get) +} + } +} +defaultStringType + } - /** Returns the session default string type */ - private def sessionDefaultStringType: StringType = -StringType(conf.defaultStringType.collationId) + /** Returns the session default string type used for DML queries */ Review Comment: again, there is no session default string type anymore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51135][SQL] Fix ViewResolverSuite for ANSI modes [spark]
cloud-fan closed pull request #49857: [SPARK-51135][SQL] Fix ViewResolverSuite for ANSI modes URL: https://github.com/apache/spark/pull/49857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51138][PYTHON][CONNECT][TESTS] Skip pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_area_plot [spark]
HyukjinKwon closed pull request #49859: [SPARK-51138][PYTHON][CONNECT][TESTS] Skip pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_area_plot URL: https://github.com/apache/spark/pull/49859 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51138][PYTHON][CONNECT][TESTS] Skip pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_area_plot [spark]
HyukjinKwon commented on PR #49859: URL: https://github.com/apache/spark/pull/49859#issuecomment-2646721135 Merged to master and branch-4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
zhengruifeng commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2646737700 The JPMML upgrade and related code changes LGTM. also ping @LuciferYang and @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
wayneguow commented on code in PR #49854: URL: https://github.com/apache/spark/pull/49854#discussion_r1948060171 ## pom.xml: ## @@ -599,7 +599,7 @@ org.glassfish.jaxb jaxb-runtime -2.3.2 +4.0.5 Review Comment: There is currently no mutual dependence between the `org.glassfish.jersey` series (3.0.16) and the `org.glassfish.jaxb` (4.0.5) series. But they both depend on `jakarta.xml.bind:jakarta.xml.bind-api` and `jakarta.activation:jakarta.activation-api`, and the dependency versions of `jaxb` are newer (xml: 4.0.2 vs 3.0.1, activation: 2.1.3 vs 2.0.1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] [SPARK-51135][SQL] Fix ViewResolverSuite for ANSI modes [spark]
vladimirg-db opened a new pull request, #49857: URL: https://github.com/apache/spark/pull/49857 ### What changes were proposed in this pull request? Fix `ViewResolverSuite` for non-ANSI mode. View column `Cast`s have have ANSI evaluation in non-ANSI mode when view schema `COMPENSATION` is the default view schema mode: https://github.com/apache/spark/blob/301b666a1fcbd4c59d96c53fe3a547ea1512f397/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L965. Also, use `dsl` package to simplify test code. ### Why are the changes needed? To fix tests in non-ANSI mode: https://github.com/apache/spark/pull/49658#discussion_r1944266007 ### Does this PR introduce _any_ user-facing change? Just a test suite change. ### How was this patch tested? Fixed the `ViewResolverSuite`. ### Was this patch authored or co-authored using generative AI tooling? copilot.nvim. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-50982][SQL] Support more SQL/DataFrame read path functionality in single-pass Analyzer [spark]
vladimirg-db commented on code in PR #49658: URL: https://github.com/apache/spark/pull/49658#discussion_r1948128138 ## sql/core/src/test/scala/org/apache/spark/sql/analysis/resolver/ViewResolverSuite.scala: ## @@ -0,0 +1,188 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.analysis.resolver + +import org.apache.spark.sql.{AnalysisException, QueryTest} +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation +import org.apache.spark.sql.catalyst.analysis.resolver.{MetadataResolver, Resolver} +import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeReference, Cast} +import org.apache.spark.sql.catalyst.plans.logical.{ + LocalRelation, + LogicalPlan, + OneRowRelation, + Project, + SubqueryAlias, + View +} +import org.apache.spark.sql.test.SharedSparkSession +import org.apache.spark.sql.types.{IntegerType, StringType} + +class ViewResolverSuite extends QueryTest with SharedSparkSession { + private val catalogName = +"spark_catalog" + private val col1Integer = +AttributeReference(name = "col1", dataType = IntegerType, nullable = false)() + private val col2String = +AttributeReference(name = "col2", dataType = StringType, nullable = false)() + + test("Temporary view") { +withView("temporary_view") { + spark.sql("CREATE TEMPORARY VIEW temporary_view AS SELECT col1, col2 FROM VALUES (1, 'a');") + + checkViewResolution( +"SELECT * FROM temporary_view", +expectedChild = Project( + projectList = Seq( +Alias(Cast(col1Integer, IntegerType).withTimeZone(conf.sessionLocalTimeZone), "col1")(), +Alias(Cast(col2String, StringType).withTimeZone(conf.sessionLocalTimeZone), "col2")() + ), + child = Project( +projectList = Seq(col1Integer, col2String), +child = LocalRelation( + output = Seq(col1Integer, col2String), + data = Seq( +InternalRow.fromSeq(Seq(1, "a").map(CatalystTypeConverters.convertToCatalyst)) + ) +) + ) +) + ) +} + } + + test("Persistent view") { Review Comment: Created a PR: https://github.com/apache/spark/pull/49857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
wayneguow commented on PR #49854: URL: https://github.com/apache/spark/pull/49854#issuecomment-2646172436 @vruusmann Thank you for sharing these views, much appreciated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51132][ML][BUILD] Upgrade `JPMML` to 1.7.1 [spark]
wayneguow commented on code in PR #49854: URL: https://github.com/apache/spark/pull/49854#discussion_r1948060923 ## pom.xml: ## @@ -615,16 +615,17 @@ org.jvnet.staxex stax-ex - -jakarta.activation Review Comment: Hmm, I found that the `jakarta.xml.bind:jakarta.xml.bind-api` exclusion in `jersey-server` can be removed, but the `com.sun.activation:jakarta.activation` exclusion in `jersey-common` can still be retained, because this dependence is currently useless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org