[jira] [Created] (SPARK-51608) Better log exception in python udf worker.
Dmitry created SPARK-51608: -- Summary: Better log exception in python udf worker. Key: SPARK-51608 URL: https://issues.apache.org/jira/browse/SPARK-51608 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Dmitry It was possible to see the error in the logs: {{24/12/27 20:25:04 WARN PythonUDFWithNamedArgumentsRunner: Failed to stop worker}} However, it does not reaveal what exactely happened. It really makes sense to include exception information to the logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51609) Optimize simple queries
[ https://issues.apache.org/jira/browse/SPARK-51609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavle Martinović updated SPARK-51609: - Description: We would like to speed up the execution RecursiveCTEs. It is possible to optimize simple queries to run in-memory, leading to large speed ups. > Optimize simple queries > --- > > Key: SPARK-51609 > URL: https://issues.apache.org/jira/browse/SPARK-51609 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Pavle Martinović >Priority: Major > > We would like to speed up the execution RecursiveCTEs. It is possible to > optimize simple queries to run in-memory, leading to large speed ups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51610) Support the TIME data type in the parquet datasource
Max Gekk created SPARK-51610: Summary: Support the TIME data type in the parquet datasource Key: SPARK-51610 URL: https://issues.apache.org/jira/browse/SPARK-51610 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.1.0 Reporter: Max Gekk Assignee: Max Gekk Allow the TIME type in the Parquet datasource which was disabled by SPARK-51590. Support TimeType in vectorized and non-vectorized readers as well as in the writer. Write tests for: - the read path. Create a parquet file using an external library, and read it back by Spark SQL. - the write path: write TIME values to parquet and read it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51574) Implement filters
[ https://issues.apache.org/jira/browse/SPARK-51574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-51574: --- Assignee: Haoyu Weng > Implement filters > - > > Key: SPARK-51574 > URL: https://issues.apache.org/jira/browse/SPARK-51574 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.1.0 >Reporter: Haoyu Weng >Assignee: Haoyu Weng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51496) CaseInsensitiveStringMap comparison should ignore case
[ https://issues.apache.org/jira/browse/SPARK-51496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-51496. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 50275 [https://github.com/apache/spark/pull/50275] > CaseInsensitiveStringMap comparison should ignore case > -- > > Key: SPARK-51496 > URL: https://issues.apache.org/jira/browse/SPARK-51496 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When merging commandOptions and dsOptions (both of them are > CaseInsensitiveStringMap), we have > {quote}assert(commandOptions == dsOptions || commandOptions.isEmpty || > dsOptions.isEmpty){quote} > If commandOptions has ("KEY", "value"), and dsOption has ("key", "value"), > the assertion fails, but it should pass instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51611) Implement new single-pass Analyzer functionality
Vladimir Golubev created SPARK-51611: Summary: Implement new single-pass Analyzer functionality Key: SPARK-51611 URL: https://issues.apache.org/jira/browse/SPARK-51611 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.1.0 Reporter: Vladimir Golubev - GROUP BY - ORDER BY - JOIN - Correlated subqueries - Other small features and bugfixes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45900) Expand hash functionalities from to include XXH3
[ https://issues.apache.org/jira/browse/SPARK-45900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938631#comment-17938631 ] Dmitry Kravchuk commented on SPARK-45900: - We also need this feature) > Expand hash functionalities from to include XXH3 > > > Key: SPARK-45900 > URL: https://issues.apache.org/jira/browse/SPARK-45900 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Nathan Holland >Priority: Major > > I often work in projects that require deterministic randomness, especially > when creating surrogate keys. For small volumes of data xxhash64 works well > however this functionality doesn't scale well - with a 64-bit hash code, the > chance of collision is one in a million when you hash just six million items > increasing sharply due to the birthday paradox. > Currently there are a few ways to handle this > - hash: 32-bit output (>50% chance of collision at least one for tables > larger than 77000 rows) > - xxhash64: 64-bit output (>50% chance of collision at least one for tables > larger than 5 billion rows) > - shaXXX/md5: single binary column input, string output, quite > computationally expensive. > I'd suggest adding the newest algorithm in the xxhash64 family, XXH3. The > XXH3 family is a modern 64-bit and 128-bit hash function family that provides > improved strength and performance across the board. > I'd imagine this would be a new function named xxhash3 and take 64 bit, and > 128bit bit lengths. For usability I believe the bit length should default to > 128bits to provide the best experience to reduce accidental collisions and > leave users to set the bit length to 64 as an override if they need to for > additional performance or interop reasons. (given the benchmarks, this would > likely be quite rare) > References: > * [Documentation|https://xxhash.com/] > * xxHash64 Ticket (https://issues.apache.org/jira/browse/SPARK-27099) > * [Existing xxHash64 > logic|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/XXH64.java] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51611) Implement new single-pass Analyzer functionality
[ https://issues.apache.org/jira/browse/SPARK-51611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51611: --- Labels: pull-request-available (was: ) > Implement new single-pass Analyzer functionality > > > Key: SPARK-51611 > URL: https://issues.apache.org/jira/browse/SPARK-51611 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Vladimir Golubev >Priority: Major > Labels: pull-request-available > > - GROUP BY > - ORDER BY > - JOIN > - Correlated subqueries > - Other small features and bugfixes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51600) Should prepend `sql/hive`/`sql/hive-thriftserver` when `isTesting || isTestingSql` is true
[ https://issues.apache.org/jira/browse/SPARK-51600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-51600. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 50385 [https://github.com/apache/spark/pull/50385] > Should prepend `sql/hive`/`sql/hive-thriftserver` when `isTesting || > isTestingSql` is true > -- > > Key: SPARK-51600 > URL: https://issues.apache.org/jira/browse/SPARK-51600 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51615) Refactor ShowNamespaces to use RunnableCommand
Szehon Ho created SPARK-51615: - Summary: Refactor ShowNamespaces to use RunnableCommand Key: SPARK-51615 URL: https://issues.apache.org/jira/browse/SPARK-51615 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Szehon Ho RunnableCommand is the latest way to run commands. The advantage over the old way is that we have a single node (no need for a logicalPlan and physical exec node). We should refactor 'SHOW NAMESPACES' to use the new approach. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51615) Refactor ShowNamespaces to use RunnableCommand
[ https://issues.apache.org/jira/browse/SPARK-51615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51615: --- Labels: pull-request-available (was: ) > Refactor ShowNamespaces to use RunnableCommand > -- > > Key: SPARK-51615 > URL: https://issues.apache.org/jira/browse/SPARK-51615 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 4.0.0 >Reporter: Szehon Ho >Priority: Minor > Labels: pull-request-available > > RunnableCommand is the latest way to run commands. The advantage over the old > way is that we have a single node (no need for a logicalPlan and physical > exec node). > We should refactor 'SHOW NAMESPACES' to use the new approach. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51614) Fix error message when there is a Generate under an UnresolvedHaving
[ https://issues.apache.org/jira/browse/SPARK-51614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51614: --- Labels: pull-request-available (was: ) > Fix error message when there is a Generate under an UnresolvedHaving > > > Key: SPARK-51614 > URL: https://issues.apache.org/jira/browse/SPARK-51614 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Mihailo Aleksic >Priority: Major > Labels: pull-request-available > > Generate under an UnresolvedHaving throws internal error and I propose that > we fix it so we throw a meaningful error from CheckAnalysis. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51609) Optimize Recursive CTE execution for simple queries
[ https://issues.apache.org/jira/browse/SPARK-51609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51609: --- Labels: pull-request-available (was: ) > Optimize Recursive CTE execution for simple queries > --- > > Key: SPARK-51609 > URL: https://issues.apache.org/jira/browse/SPARK-51609 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Pavle Martinović >Priority: Major > Labels: pull-request-available > > We would like to speed up the execution RecursiveCTEs. It is possible to > optimize simple queries to run in-memory, leading to large speed ups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51587) [PySpark] Fix an issue where timestamp cannot be used in ListState when multiple state data is involved
[ https://issues.apache.org/jira/browse/SPARK-51587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-51587: - Issue Type: Bug (was: Task) > [PySpark] Fix an issue where timestamp cannot be used in ListState when > multiple state data is involved > --- > > Key: SPARK-51587 > URL: https://issues.apache.org/jira/browse/SPARK-51587 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bo Gao >Assignee: Bo Gao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Fix an issue where timestamp cannot be used in ListState when multiple state > data is involved. > Right now below error will be thrown > {code:python} > [UNSUPPORTED_ARROWTYPE] Unsupported arrow type Timestamp(NANOSECOND, null). > SQLSTATE: 0A000 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-51606) After exiting the remote local connect shell, the SparkConnectServer will not terminate.
[ https://issues.apache.org/jira/browse/SPARK-51606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938496#comment-17938496 ] Yang Jie commented on SPARK-51606: -- cc [~gurwls223] > After exiting the remote local connect shell, the SparkConnectServer will not > terminate. > > > Key: SPARK-51606 > URL: https://issues.apache.org/jira/browse/SPARK-51606 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 25/03/26 15:43:55 INFO SparkSession: Spark Connect server started with the > log file: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-cb51ad74-00e1-4567-9746-3dc9a7888ecb-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out > 25/03/26 15:43:56 INFO BaseAllocator: Debug mode disabled. Enable with the VM > option -Darrow.memory.debug.allocator=true. > 25/03/26 15:43:56 INFO DefaultAllocationManagerOption: allocation manager > type not specified, using netty as the default type > 25/03/26 15:43:56 INFO CheckAllocator: Using DefaultAllocationManager at > memory/netty/DefaultAllocationManagerFactory.class > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 4.1.0-SNAPSHOT > /_/ > Type in expressions to have them evaluated. > Spark connect server version 4.1.0-SNAPSHOT. > Spark session available as 'spark'. > > scala> exit > Bye! > 25/03/26 15:44:00 INFO ShutdownHookManager: Shutdown hook called > 25/03/26 15:44:00 INFO ShutdownHookManager: Deleting directory > /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-ad8dfdf4-cf2b-413f-a9e3-d6e310dff1ea > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 25/03/26 15:44:04 INFO SparkSession: Spark Connect server started with the > log file: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-a7b9a1dc-1e16-4e0e-b7c1-8f957d730df3-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out > 25/03/26 15:44:05 INFO BaseAllocator: Debug mode disabled. Enable with the VM > option -Darrow.memory.debug.allocator=true. > 25/03/26 15:44:05 INFO DefaultAllocationManagerOption: allocation manager > type not specified, using netty as the default type > 25/03/26 15:44:05 INFO CheckAllocator: Using DefaultAllocationManager at > memory/netty/DefaultAllocationManagerFactory.class > Exception in thread "main" org.apache.spark.SparkException: > org.sparkproject.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Invalid > authentication token > at > org.apache.spark.sql.connect.client.GrpcExceptionConverter.toThrowable(GrpcExceptionConverter.scala:162) > at > org.apache.spark.sql.connect.client.GrpcExceptionConverter.convert(GrpcExceptionConverter.scala:61) > at > org.apache.spark.sql.connect.client.CustomSparkConnectBlockingStub.analyzePlan(CustomSparkConnectBlockingStub.scala:75) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:110) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:256) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:227) > at > org.apache.spark.sql.connect.SparkSession.version$lzycompute(SparkSession.scala:92) > at > org.apache.spark.sql.connect.SparkSession.version(SparkSession.scala:91) > at > org.apache.spark.sql.application.ConnectRepl$$anon$1.(ConnectRepl.scala:106) > at > org.apache.spark.sql.application.ConnectRepl$.$anonfun$doMain$1(ConnectRepl.scala:105) > at > org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:824) > at > org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) > at > org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) > at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:569) > at > org.apache.spark.deploy.Ja
[jira] [Assigned] (SPARK-51607) the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules
[ https://issues.apache.org/jira/browse/SPARK-51607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-51607: Assignee: Yang Jie > the configuration for `maven-shade-plugin` should be set to `combine.self = > "override"` In the `connect` modules > > > Key: SPARK-51607 > URL: https://issues.apache.org/jira/browse/SPARK-51607 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51600) Should prepend `sql/hive`/`sql/hive-thriftserver` when `isTesting || isTestingSql` is true
[ https://issues.apache.org/jira/browse/SPARK-51600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-51600: Assignee: Yang Jie > Should prepend `sql/hive`/`sql/hive-thriftserver` when `isTesting || > isTestingSql` is true > -- > > Key: SPARK-51600 > URL: https://issues.apache.org/jira/browse/SPARK-51600 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51614) Fix error message when there is a Generate under an UnresolvedHaving
Mihailo Aleksic created SPARK-51614: --- Summary: Fix error message when there is a Generate under an UnresolvedHaving Key: SPARK-51614 URL: https://issues.apache.org/jira/browse/SPARK-51614 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.1.0 Reporter: Mihailo Aleksic Generate under an UnresolvedHaving throws internal error and I propose that we fix it so we throw a meaningful error from CheckAnalysis. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51577) Make gradle build to automatically append SNAPSHOT suffix to version for non-release builds
[ https://issues.apache.org/jira/browse/SPARK-51577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51577. --- Fix Version/s: kubernetes-operator-0.1.0 Resolution: Fixed Issue resolved by pull request 170 [https://github.com/apache/spark-kubernetes-operator/pull/170] > Make gradle build to automatically append SNAPSHOT suffix to version for > non-release builds > --- > > Key: SPARK-51577 > URL: https://issues.apache.org/jira/browse/SPARK-51577 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-0.1.0 > > > As we already configured `version` in gradle properties, it makes sense to > append `-SNAPSHOT` as needed comparing with hardcoded override -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51577) Make gradle build to automatically append SNAPSHOT suffix to version for non-release builds
[ https://issues.apache.org/jira/browse/SPARK-51577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51577: - Assignee: Zhou JIANG > Make gradle build to automatically append SNAPSHOT suffix to version for > non-release builds > --- > > Key: SPARK-51577 > URL: https://issues.apache.org/jira/browse/SPARK-51577 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: kubernetes-operator-0.1.0 >Reporter: Zhou JIANG >Assignee: Zhou JIANG >Priority: Major > Labels: pull-request-available > > As we already configured `version` in gradle properties, it makes sense to > append `-SNAPSHOT` as needed comparing with hardcoded override -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51612) Display view creation confs in Desc As Json
Amanda Liu created SPARK-51612: -- Summary: Display view creation confs in Desc As Json Key: SPARK-51612 URL: https://issues.apache.org/jira/browse/SPARK-51612 Project: Spark Issue Type: Task Components: SQL Affects Versions: 4.0.0 Reporter: Amanda Liu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51613) Improve Spark Operator metrics
Damon Cortesi created SPARK-51613: - Summary: Improve Spark Operator metrics Key: SPARK-51613 URL: https://issues.apache.org/jira/browse/SPARK-51613 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: kubernetes-operator-0.1.0 Reporter: Damon Cortesi Today the Spark Operator provides JVM, Kubernetes, and Java Operator SDK metrics, but no metrics specific to the functionality and health of the Spark App or Cluster resources managed by the operator. It would be nice to have metrics like: * Total counts of Apps or Clusters by state (Submitted, Failed, Successful, etc) * Gauges of Apps or Clusters by state (Submitted, Pending, Running, etc) * Timers for Spark submit latency (Submission --> Running for example) * Potentially depth of the reconciliation backlog and how many apps are getting added per interval, although this may already be handled in the operator SDK metrics via reconciliations_queue_size In addition, it would be nice to have Prometheus metrics with labels, but it doesn't look like Dropwizard supports that (nor likely to happen via [https://github.com/dropwizard/metrics/issues/1272] ). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51607) the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules
[ https://issues.apache.org/jira/browse/SPARK-51607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-51607. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 50401 [https://github.com/apache/spark/pull/50401] > the configuration for `maven-shade-plugin` should be set to `combine.self = > "override"` In the `connect` modules > > > Key: SPARK-51607 > URL: https://issues.apache.org/jira/browse/SPARK-51607 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51612) Display Spark confs set at view creation in Desc As Json
[ https://issues.apache.org/jira/browse/SPARK-51612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amanda Liu updated SPARK-51612: --- Summary: Display Spark confs set at view creation in Desc As Json (was: Display view creation confs in Desc As Json) > Display Spark confs set at view creation in Desc As Json > > > Key: SPARK-51612 > URL: https://issues.apache.org/jira/browse/SPARK-51612 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Amanda Liu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51610) Support the TIME data type in the parquet datasource
[ https://issues.apache.org/jira/browse/SPARK-51610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51610: --- Labels: pull-request-available (was: ) > Support the TIME data type in the parquet datasource > > > Key: SPARK-51610 > URL: https://issues.apache.org/jira/browse/SPARK-51610 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.1.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > Allow the TIME type in the Parquet datasource which was disabled by > SPARK-51590. Support TimeType in vectorized and non-vectorized readers as > well as in the writer. > Write tests for: > - the read path. Create a parquet file using an external library, and read it > back by Spark SQL. > - the write path: write TIME values to parquet and read it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51615) Refactor ShowNamespaces to use RunnableCommand
[ https://issues.apache.org/jira/browse/SPARK-51615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated SPARK-51615: -- Issue Type: Improvement (was: New Feature) > Refactor ShowNamespaces to use RunnableCommand > -- > > Key: SPARK-51615 > URL: https://issues.apache.org/jira/browse/SPARK-51615 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Szehon Ho >Priority: Minor > Labels: pull-request-available > > RunnableCommand is the latest way to run commands. The advantage over the old > way is that we have a single node (no need for a logicalPlan and physical > exec node). > We should refactor 'SHOW NAMESPACES' to use the new approach. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51612) Display Spark confs set at view creation in Desc As Json
[ https://issues.apache.org/jira/browse/SPARK-51612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51612: --- Labels: pull-request-available (was: ) > Display Spark confs set at view creation in Desc As Json > > > Key: SPARK-51612 > URL: https://issues.apache.org/jira/browse/SPARK-51612 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Amanda Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51318) Remove `jar` files from Apache Spark repository and disable affected tests
[ https://issues.apache.org/jira/browse/SPARK-51318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-51318. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 50378 [https://github.com/apache/spark/pull/50378] > Remove `jar` files from Apache Spark repository and disable affected tests > -- > > Key: SPARK-51318 > URL: https://issues.apache.org/jira/browse/SPARK-51318 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.1.0 >Reporter: Dongjoon Hyun >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This issue aims to remove `jar` file from the following branches and disabled > affected tests with IDed TODO comments. > *MASTER* > {code} > $ find . -name '*.jar' > ./connect-examples/server-library-example/resources/spark-daria_2.13-1.2.3.jar > ./core/src/test/resources/TestHelloV2_2.13.jar > ./core/src/test/resources/TestHelloV3_2.13.jar > ./core/src/test/resources/TestUDTF.jar > ./data/artifact-tests/junitLargeJar.jar > ./data/artifact-tests/smallJar.jar > ./sql/core/src/test/resources/SPARK-33084.jar > ./sql/core/src/test/resources/artifact-tests/udf_noA.jar > ./sql/hive/src/test/noclasspath/hive-test-udfs.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar > ./sql/hive/src/test/resources/SPARK-21101-1.0.jar > ./sql/hive/src/test/resources/data/files/TestSerDe.jar > ./sql/hive/src/test/resources/TestUDTF.jar > ./sql/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar > ./sql/connect/common/src/test/resources/artifact-tests/smallJar.jar > ./sql/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar > ./sql/connect/client/jvm/src/test/resources/udf2.13.jar > ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar > {code} > *branch-4.0* > {code} > $ find . -name '*.jar' > ./connect-examples/server-library-example/resources/spark-daria_2.13-1.2.3.jar > ./core/src/test/resources/TestHelloV2_2.13.jar > ./core/src/test/resources/TestHelloV3_2.13.jar > ./core/src/test/resources/TestUDTF.jar > ./data/artifact-tests/junitLargeJar.jar > ./data/artifact-tests/smallJar.jar > ./sql/core/src/test/resources/SPARK-33084.jar > ./sql/core/src/test/resources/artifact-tests/udf_noA.jar > ./sql/hive/src/test/noclasspath/hive-test-udfs.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar > ./sql/hive/src/test/resources/SPARK-21101-1.0.jar > ./sql/hive/src/test/resources/data/files/TestSerDe.jar > ./sql/hive/src/test/resources/TestUDTF.jar > ./sql/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar > ./sql/connect/common/src/test/resources/artifact-tests/smallJar.jar > ./sql/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar > ./sql/connect/client/jvm/src/test/resources/udf2.13.jar > ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar > {code} > *branch-3.5* > {code} > $ find . -name '*.jar' > ./core/src/test/resources/TestHelloV3_2.12.jar > ./core/src/test/resources/TestHelloV2_2.12.jar > ./core/src/test/resources/TestHelloV2_2.13.jar > ./core/src/test/resources/TestHelloV3_2.13.jar > ./core/src/test/resources/TestUDTF.jar > ./connector/connect/server/src/test/resources/udf_noA.jar > ./connector/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar > ./connector/connect/common/src/test/resources/artifact-tests/smallJar.jar > ./connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar > ./connector/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar > ./connector/connect/client/jvm/src/test/resources/udf2.13.jar > ./connector/connect/client/jvm/src/test/resources/udf2.12.jar > ./data/artifact-tests/junitLargeJar.jar > ./data/artifact-tests/smallJar.jar > ./sql/core/src/test/resources/SPARK-33084.jar > ./sql/hive/src/test/noclasspath/hive-test-udfs.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar > ./sql/hive/src/test/resources/SPARK-21101-1.0.jar > ./sql/hive/src/test/resources/data/files/TestSerDe.jar > ./sql/hive/src/test/resources/TestUDTF.jar > ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51318) Remove `jar` files from Apache Spark repository and disable affected tests
[ https://issues.apache.org/jira/browse/SPARK-51318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-51318: Assignee: Hyukjin Kwon > Remove `jar` files from Apache Spark repository and disable affected tests > -- > > Key: SPARK-51318 > URL: https://issues.apache.org/jira/browse/SPARK-51318 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 4.1.0 >Reporter: Dongjoon Hyun >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > This issue aims to remove `jar` file from the following branches and disabled > affected tests with IDed TODO comments. > *MASTER* > {code} > $ find . -name '*.jar' > ./connect-examples/server-library-example/resources/spark-daria_2.13-1.2.3.jar > ./core/src/test/resources/TestHelloV2_2.13.jar > ./core/src/test/resources/TestHelloV3_2.13.jar > ./core/src/test/resources/TestUDTF.jar > ./data/artifact-tests/junitLargeJar.jar > ./data/artifact-tests/smallJar.jar > ./sql/core/src/test/resources/SPARK-33084.jar > ./sql/core/src/test/resources/artifact-tests/udf_noA.jar > ./sql/hive/src/test/noclasspath/hive-test-udfs.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar > ./sql/hive/src/test/resources/SPARK-21101-1.0.jar > ./sql/hive/src/test/resources/data/files/TestSerDe.jar > ./sql/hive/src/test/resources/TestUDTF.jar > ./sql/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar > ./sql/connect/common/src/test/resources/artifact-tests/smallJar.jar > ./sql/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar > ./sql/connect/client/jvm/src/test/resources/udf2.13.jar > ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar > {code} > *branch-4.0* > {code} > $ find . -name '*.jar' > ./connect-examples/server-library-example/resources/spark-daria_2.13-1.2.3.jar > ./core/src/test/resources/TestHelloV2_2.13.jar > ./core/src/test/resources/TestHelloV3_2.13.jar > ./core/src/test/resources/TestUDTF.jar > ./data/artifact-tests/junitLargeJar.jar > ./data/artifact-tests/smallJar.jar > ./sql/core/src/test/resources/SPARK-33084.jar > ./sql/core/src/test/resources/artifact-tests/udf_noA.jar > ./sql/hive/src/test/noclasspath/hive-test-udfs.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar > ./sql/hive/src/test/resources/SPARK-21101-1.0.jar > ./sql/hive/src/test/resources/data/files/TestSerDe.jar > ./sql/hive/src/test/resources/TestUDTF.jar > ./sql/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar > ./sql/connect/common/src/test/resources/artifact-tests/smallJar.jar > ./sql/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar > ./sql/connect/client/jvm/src/test/resources/udf2.13.jar > ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar > {code} > *branch-3.5* > {code} > $ find . -name '*.jar' > ./core/src/test/resources/TestHelloV3_2.12.jar > ./core/src/test/resources/TestHelloV2_2.12.jar > ./core/src/test/resources/TestHelloV2_2.13.jar > ./core/src/test/resources/TestHelloV3_2.13.jar > ./core/src/test/resources/TestUDTF.jar > ./connector/connect/server/src/test/resources/udf_noA.jar > ./connector/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar > ./connector/connect/common/src/test/resources/artifact-tests/smallJar.jar > ./connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar > ./connector/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar > ./connector/connect/client/jvm/src/test/resources/udf2.13.jar > ./connector/connect/client/jvm/src/test/resources/udf2.12.jar > ./data/artifact-tests/junitLargeJar.jar > ./data/artifact-tests/smallJar.jar > ./sql/core/src/test/resources/SPARK-33084.jar > ./sql/hive/src/test/noclasspath/hive-test-udfs.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar > ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar > ./sql/hive/src/test/resources/SPARK-21101-1.0.jar > ./sql/hive/src/test/resources/data/files/TestSerDe.jar > ./sql/hive/src/test/resources/TestUDTF.jar > ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51620) Support `columns` for `DataFrame`
Dongjoon Hyun created SPARK-51620: - Summary: Support `columns` for `DataFrame` Key: SPARK-51620 URL: https://issues.apache.org/jira/browse/SPARK-51620 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: connect-swift-0.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51620) Support `columns` for `DataFrame`
[ https://issues.apache.org/jira/browse/SPARK-51620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51620: --- Labels: pull-request-available (was: ) > Support `columns` for `DataFrame` > - > > Key: SPARK-51620 > URL: https://issues.apache.org/jira/browse/SPARK-51620 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: connect-swift-0.1.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51612) Display Spark confs set at view creation in Desc As Json
[ https://issues.apache.org/jira/browse/SPARK-51612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-51612: -- Assignee: Amanda Liu > Display Spark confs set at view creation in Desc As Json > > > Key: SPARK-51612 > URL: https://issues.apache.org/jira/browse/SPARK-51612 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51612) Display Spark confs set at view creation in Desc As Json
[ https://issues.apache.org/jira/browse/SPARK-51612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-51612. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 50407 [https://github.com/apache/spark/pull/50407] > Display Spark confs set at view creation in Desc As Json > > > Key: SPARK-51612 > URL: https://issues.apache.org/jira/browse/SPARK-51612 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Amanda Liu >Assignee: Amanda Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51588) Validate default values handling in micro-batch streaming writes
[ https://issues.apache.org/jira/browse/SPARK-51588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-51588: Assignee: Anton Okolnychyi > Validate default values handling in micro-batch streaming writes > > > Key: SPARK-51588 > URL: https://issues.apache.org/jira/browse/SPARK-51588 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.1.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51588) Validate default values handling in micro-batch streaming writes
[ https://issues.apache.org/jira/browse/SPARK-51588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-51588. -- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50351 [https://github.com/apache/spark/pull/50351] > Validate default values handling in micro-batch streaming writes > > > Key: SPARK-51588 > URL: https://issues.apache.org/jira/browse/SPARK-51588 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.1.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51616) Run CollationTypeCasts explicitly before assigning aliases
Vladimir Golubev created SPARK-51616: Summary: Run CollationTypeCasts explicitly before assigning aliases Key: SPARK-51616 URL: https://issues.apache.org/jira/browse/SPARK-51616 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Vladimir Golubev The solution introduced in https://issues.apache.org/jira/browse/SPARK-51428 has a pitfall - Aliases are reassigned only cosmetically, the name resolution is stll done by old names. This is a better solution for the problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51617) Explicitly commit/revert jar removals
[ https://issues.apache.org/jira/browse/SPARK-51617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-51617: Assignee: Hyukjin Kwon > Explicitly commit/revert jar removals > - > > Key: SPARK-51617 > URL: https://issues.apache.org/jira/browse/SPARK-51617 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Address https://github.com/apache/spark/pull/50378#discussion_r2013712753 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51617) Explicitly commit/revert jar removals
[ https://issues.apache.org/jira/browse/SPARK-51617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-51617. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 50415 [https://github.com/apache/spark/pull/50415] > Explicitly commit/revert jar removals > - > > Key: SPARK-51617 > URL: https://issues.apache.org/jira/browse/SPARK-51617 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Address https://github.com/apache/spark/pull/50378#discussion_r2013712753 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51617) Explicitly commit/revert jar removals
[ https://issues.apache.org/jira/browse/SPARK-51617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-51617: - Summary: Explicitly commit/revert jar removals (was: Restore the test jars at the end of release process) > Explicitly commit/revert jar removals > - > > Key: SPARK-51617 > URL: https://issues.apache.org/jira/browse/SPARK-51617 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Address https://github.com/apache/spark/pull/50378#discussion_r2013712753 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51617) Explicitly commit/revert jar removals
[ https://issues.apache.org/jira/browse/SPARK-51617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51617: --- Labels: pull-request-available (was: ) > Explicitly commit/revert jar removals > - > > Key: SPARK-51617 > URL: https://issues.apache.org/jira/browse/SPARK-51617 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > Address https://github.com/apache/spark/pull/50378#discussion_r2013712753 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51618) Add a check for jars in CI
Hyukjin Kwon created SPARK-51618: Summary: Add a check for jars in CI Key: SPARK-51618 URL: https://issues.apache.org/jira/browse/SPARK-51618 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon We should disallow jars being added in the source -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51585) Oracle dialect supports pushdown datetime functions.
Jiaan Geng created SPARK-51585: -- Summary: Oracle dialect supports pushdown datetime functions. Key: SPARK-51585 URL: https://issues.apache.org/jira/browse/SPARK-51585 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.1.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51619) Support UDT in Arrow-optimized Python UDF.
[ https://issues.apache.org/jira/browse/SPARK-51619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51619: --- Labels: pull-request-available (was: ) > Support UDT in Arrow-optimized Python UDF. > -- > > Key: SPARK-51619 > URL: https://issues.apache.org/jira/browse/SPARK-51619 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.1.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51544) Add only unique and necessary metadata columns
[ https://issues.apache.org/jira/browse/SPARK-51544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-51544. - Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50304 [https://github.com/apache/spark/pull/50304] > Add only unique and necessary metadata columns > -- > > Key: SPARK-51544 > URL: https://issues.apache.org/jira/browse/SPARK-51544 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Timotic >Assignee: Mihailo Timotic >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > AddMetadataColumns should add only unique and necessary metadata columns, not > the entire child's metadata output -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51617) Restore the test jars at the end of release process
Hyukjin Kwon created SPARK-51617: Summary: Restore the test jars at the end of release process Key: SPARK-51617 URL: https://issues.apache.org/jira/browse/SPARK-51617 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Address https://github.com/apache/spark/pull/50378#discussion_r2013712753 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.
[ https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938758#comment-17938758 ] Hyukjin Kwon commented on SPARK-51605: -- Ack > If the `logs` directory does not exist, the first launch of `bin/spark-shell > --remote local` will fail. > --- > > Key: SPARK-51605 > URL: https://issues.apache.org/jira/browse/SPARK-51605 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Exception in thread "main" java.nio.file.NoSuchFileException: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) > at java.base/java.nio.file.Files.readAttributes(Files.java:1851) > at > java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:569) > at > java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150) > at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885) > at java.base/java.nio.file.Path.register(Path.java:894) > at > org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791) > at scala.Option.foreach(Option.scala:437) > at > org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791) > at > org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) > at > org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) > at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:569) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called > 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory > /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51621) Support `sparkSession` for `DataFrame`
[ https://issues.apache.org/jira/browse/SPARK-51621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51621: --- Labels: pull-request-available (was: ) > Support `sparkSession` for `DataFrame` > -- > > Key: SPARK-51621 > URL: https://issues.apache.org/jira/browse/SPARK-51621 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: connect-swift-0.1.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46860) Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command
[ https://issues.apache.org/jira/browse/SPARK-46860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938513#comment-17938513 ] Krzysztof Ruta commented on SPARK-46860: I did some research and experiments. I identified two places where URL containing credentials is potentially logged - this applies particularly to pt.2 above. But as soon as I addressed it I found another... E.g. Spark stores its jars location in session properties (spark.jars), what if somebody decides to log full spark config for debugging purposes? Or what if somebody logs full spark-submit commant (that includes URL) even before spark app is launched? I don't think this is the way to go, I mean to alter Spark logging in order to keep secrets safe. This would give you a false sense of confidence that your password would not leak in any case. You cannot be sure that in some scenarios (network problems, wrong characters in password, debug level logging etc.) the URL would not be logged. So in my opinion the key here is to secure your logging system independently of Spark. Take Apache Airflow of Gitlab CI/CD - either you are explicitly given the option to mask your secrets or you must do it manually, try to go this way. In any scenario I can think of this is a safer approach. To test it, just put obviously incorrect credentials (like you mentioned above) or correct ones that you can quickly change and search for them in logs. When masked, you should never see them. > Credentials with https url not working for --jars, --files, --archives & > --py-files options on spark-submit command > --- > > Key: SPARK-46860 > URL: https://issues.apache.org/jira/browse/SPARK-46860 > Project: Spark > Issue Type: Task > Components: k8s >Affects Versions: 3.3.3, 3.5.0, 3.3.4 > Environment: Spark 3.3.3 deployed on K8s >Reporter: Vikram Janarthanan >Priority: Major > Labels: pull-request-available > > We are trying to run the spark application by pointing the dependent files as > well the main pyspark script from secure webserver > We are looking for solution to pass the dependencies as well as pysaprk > script from webserver. > we have tried deploying the spark application from webserver to k8s cluster > without username and password and it worked, but when tried with > username/password we are facing "Exception in thread "{*}main" > java.io.IOException: Server returned HTTP response code: 401 for URL: > https://username:passw...@domain.com/application/pysparkjob.py{*}"; > *Working options on spark-submit:* > spark-submit .. > --repositories https://username:passw...@domain.com/repo1/repo > --jars https://domain.com/jars/runtime.jar \ > --files https://domain.com/files/query.sql \ > --py-files [https://domain.com/pythonlib/pythonlib.zip] \ > https://domain.com/app1/pysparkapp.py > Note: only repositories option works with username and password > *Spark-submit using https url with username/password not working:* > spark-submit .. > --jars https://username:passw...@domain.com/jars/runtime.jar \ > --files https://username:passw...@domain.com/files/query.sql \ > --py-files > https://username:passw...@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip] > \ > https://username:passw...@domain.com/app1/pysparkapp.py > > Error : > 25/01/23 09:19:57 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Exception in thread "main" java.io.IOException: Server returned HTTP response > code: 401 for URL: > https://username:passw...@domain.com/repository/spark-artifacts/pysparkdemo/1.0/pysparkdemo-1.0.tgz > at > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2000) > at > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) > at > java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) > at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:809) > at > org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264) > at > org.apache.spark.util.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:233) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(Tr
[jira] [Comment Edited] (SPARK-46860) Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command
[ https://issues.apache.org/jira/browse/SPARK-46860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938513#comment-17938513 ] Krzysztof Ruta edited comment on SPARK-46860 at 3/26/25 9:37 AM: - I did some research and experiments. I identified two places where URL containing credentials is potentially logged - this applies particularly to pt.2 above. But as soon as I addressed these I found another... E.g. Spark stores its jars location in session properties (spark.jars), what if somebody decides to log full spark config for debugging purposes? Or what if somebody logs full spark-submit commant (that includes URL) even before spark app is launched? I don't think this is the way to go, I mean to alter Spark logging in order to keep secrets safe. This would give you a false sense of confidence that your password would not leak in any case. You cannot be sure that in some scenarios (network problems, wrong characters in password, debug level logging etc.) the URL would not be logged. So in my opinion the key here is to secure your logging system independently of Spark. Take Apache Airflow of Gitlab CI/CD - either you are explicitly given the option to mask your secrets or you must do it manually, try to go this way. In any scenario I can think of this is a safer approach. To test it, just put obviously incorrect credentials (like you mentioned above) or correct ones that you can quickly change and search for them in logs. When masked, you should never see them. was (Author: JIRAUSER309126): I did some research and experiments. I identified two places where URL containing credentials is potentially logged - this applies particularly to pt.2 above. But as soon as I addressed it I found another... E.g. Spark stores its jars location in session properties (spark.jars), what if somebody decides to log full spark config for debugging purposes? Or what if somebody logs full spark-submit commant (that includes URL) even before spark app is launched? I don't think this is the way to go, I mean to alter Spark logging in order to keep secrets safe. This would give you a false sense of confidence that your password would not leak in any case. You cannot be sure that in some scenarios (network problems, wrong characters in password, debug level logging etc.) the URL would not be logged. So in my opinion the key here is to secure your logging system independently of Spark. Take Apache Airflow of Gitlab CI/CD - either you are explicitly given the option to mask your secrets or you must do it manually, try to go this way. In any scenario I can think of this is a safer approach. To test it, just put obviously incorrect credentials (like you mentioned above) or correct ones that you can quickly change and search for them in logs. When masked, you should never see them. > Credentials with https url not working for --jars, --files, --archives & > --py-files options on spark-submit command > --- > > Key: SPARK-46860 > URL: https://issues.apache.org/jira/browse/SPARK-46860 > Project: Spark > Issue Type: Task > Components: k8s >Affects Versions: 3.3.3, 3.5.0, 3.3.4 > Environment: Spark 3.3.3 deployed on K8s >Reporter: Vikram Janarthanan >Priority: Major > Labels: pull-request-available > > We are trying to run the spark application by pointing the dependent files as > well the main pyspark script from secure webserver > We are looking for solution to pass the dependencies as well as pysaprk > script from webserver. > we have tried deploying the spark application from webserver to k8s cluster > without username and password and it worked, but when tried with > username/password we are facing "Exception in thread "{*}main" > java.io.IOException: Server returned HTTP response code: 401 for URL: > https://username:passw...@domain.com/application/pysparkjob.py{*}"; > *Working options on spark-submit:* > spark-submit .. > --repositories https://username:passw...@domain.com/repo1/repo > --jars https://domain.com/jars/runtime.jar \ > --files https://domain.com/files/query.sql \ > --py-files [https://domain.com/pythonlib/pythonlib.zip] \ > https://domain.com/app1/pysparkapp.py > Note: only repositories option works with username and password > *Spark-submit using https url with username/password not working:* > spark-submit .. > --jars https://username:passw...@domain.com/jars/runtime.jar \ > --files https://username:passw...@domain.com/files/query.sql \ > --py-files > https://username:passw...@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip] > \ > https://username:passw...@domain.com/app1/pysparkapp.py > > Error : > 25/
[jira] [Commented] (SPARK-46860) Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command
[ https://issues.apache.org/jira/browse/SPARK-46860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938515#comment-17938515 ] Krzysztof Ruta commented on SPARK-46860: Anyway I am not sure what's gonna happen with this PR, how long could it take for somebody to review and (hopefully) integrate it. If you wish to play with it or even use it in prod, you could build spark-core yourself and then replace spark-core jar with yours. That's what I did. I can provide you with some guidance if necessary. > Credentials with https url not working for --jars, --files, --archives & > --py-files options on spark-submit command > --- > > Key: SPARK-46860 > URL: https://issues.apache.org/jira/browse/SPARK-46860 > Project: Spark > Issue Type: Task > Components: k8s >Affects Versions: 3.3.3, 3.5.0, 3.3.4 > Environment: Spark 3.3.3 deployed on K8s >Reporter: Vikram Janarthanan >Priority: Major > Labels: pull-request-available > > We are trying to run the spark application by pointing the dependent files as > well the main pyspark script from secure webserver > We are looking for solution to pass the dependencies as well as pysaprk > script from webserver. > we have tried deploying the spark application from webserver to k8s cluster > without username and password and it worked, but when tried with > username/password we are facing "Exception in thread "{*}main" > java.io.IOException: Server returned HTTP response code: 401 for URL: > https://username:passw...@domain.com/application/pysparkjob.py{*}"; > *Working options on spark-submit:* > spark-submit .. > --repositories https://username:passw...@domain.com/repo1/repo > --jars https://domain.com/jars/runtime.jar \ > --files https://domain.com/files/query.sql \ > --py-files [https://domain.com/pythonlib/pythonlib.zip] \ > https://domain.com/app1/pysparkapp.py > Note: only repositories option works with username and password > *Spark-submit using https url with username/password not working:* > spark-submit .. > --jars https://username:passw...@domain.com/jars/runtime.jar \ > --files https://username:passw...@domain.com/files/query.sql \ > --py-files > https://username:passw...@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip] > \ > https://username:passw...@domain.com/app1/pysparkapp.py > > Error : > 25/01/23 09:19:57 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > Exception in thread "main" java.io.IOException: Server returned HTTP response > code: 401 for URL: > https://username:passw...@domain.com/repository/spark-artifacts/pysparkdemo/1.0/pysparkdemo-1.0.tgz > at > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2000) > at > java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) > at > java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) > at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:809) > at > org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264) > at > org.apache.spark.util.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:233) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51609) Optimize Recursive CTE execution forsimple queries
[ https://issues.apache.org/jira/browse/SPARK-51609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavle Martinović updated SPARK-51609: - Summary: Optimize Recursive CTE execution forsimple queries (was: Optimize simple queries) > Optimize Recursive CTE execution forsimple queries > -- > > Key: SPARK-51609 > URL: https://issues.apache.org/jira/browse/SPARK-51609 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Pavle Martinović >Priority: Major > > We would like to speed up the execution RecursiveCTEs. It is possible to > optimize simple queries to run in-memory, leading to large speed ups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51609) Optimize Recursive CTE execution for simple queries
[ https://issues.apache.org/jira/browse/SPARK-51609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavle Martinović updated SPARK-51609: - Summary: Optimize Recursive CTE execution for simple queries (was: Optimize Recursive CTE execution forsimple queries) > Optimize Recursive CTE execution for simple queries > --- > > Key: SPARK-51609 > URL: https://issues.apache.org/jira/browse/SPARK-51609 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Pavle Martinović >Priority: Major > > We would like to speed up the execution RecursiveCTEs. It is possible to > optimize simple queries to run in-memory, leading to large speed ups. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51574) Implement filters
[ https://issues.apache.org/jira/browse/SPARK-51574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51574: --- Labels: pull-request-available (was: ) > Implement filters > - > > Key: SPARK-51574 > URL: https://issues.apache.org/jira/browse/SPARK-51574 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.1.0 >Reporter: Haoyu Weng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51574) Implement filters
[ https://issues.apache.org/jira/browse/SPARK-51574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-51574. - Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50252 [https://github.com/apache/spark/pull/50252] > Implement filters > - > > Key: SPARK-51574 > URL: https://issues.apache.org/jira/browse/SPARK-51574 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.1.0 >Reporter: Haoyu Weng >Assignee: Haoyu Weng >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51587) [PySpark] Fix an issue where timestamp cannot be used in ListState when multiple state data is involved
[ https://issues.apache.org/jira/browse/SPARK-51587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-51587: - Fix Version/s: 4.0.0 (was: 4.1.0) > [PySpark] Fix an issue where timestamp cannot be used in ListState when > multiple state data is involved > --- > > Key: SPARK-51587 > URL: https://issues.apache.org/jira/browse/SPARK-51587 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bo Gao >Assignee: Bo Gao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Fix an issue where timestamp cannot be used in ListState when multiple state > data is involved. > Right now below error will be thrown > {code:python} > [UNSUPPORTED_ARROWTYPE] Unsupported arrow type Timestamp(NANOSECOND, null). > SQLSTATE: 0A000 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51608) Better log exception in python udf worker.
[ https://issues.apache.org/jira/browse/SPARK-51608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51608: --- Labels: pull-request-available (was: ) > Better log exception in python udf worker. > -- > > Key: SPARK-51608 > URL: https://issues.apache.org/jira/browse/SPARK-51608 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Dmitry >Priority: Major > Labels: pull-request-available > > It was possible to see the error in the logs: > {{24/12/27 20:25:04 WARN PythonUDFWithNamedArgumentsRunner: Failed to stop > worker}} > However, it does not reaveal what exactely happened. It really makes sense to > include exception information to the logs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51609) Optimize simple queries
Pavle Martinović created SPARK-51609: Summary: Optimize simple queries Key: SPARK-51609 URL: https://issues.apache.org/jira/browse/SPARK-51609 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.1.0 Reporter: Pavle Martinović -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.
[ https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938497#comment-17938497 ] Yang Jie commented on SPARK-51605: -- cc [~gurwls223] > If the `logs` directory does not exist, the first launch of `bin/spark-shell > --remote local` will fail. > --- > > Key: SPARK-51605 > URL: https://issues.apache.org/jira/browse/SPARK-51605 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Exception in thread "main" java.nio.file.NoSuchFileException: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) > at java.base/java.nio.file.Files.readAttributes(Files.java:1851) > at > java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:569) > at > java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150) > at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885) > at java.base/java.nio.file.Path.register(Path.java:894) > at > org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791) > at scala.Option.foreach(Option.scala:437) > at > org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791) > at > org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) > at > org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) > at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:569) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called > 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory > /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51606) After exiting the remote local connect shell, the SparkConnectServer will not terminate.
[ https://issues.apache.org/jira/browse/SPARK-51606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-51606: - Affects Version/s: 4.0.0 > After exiting the remote local connect shell, the SparkConnectServer will not > terminate. > > > Key: SPARK-51606 > URL: https://issues.apache.org/jira/browse/SPARK-51606 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 25/03/26 15:43:55 INFO SparkSession: Spark Connect server started with the > log file: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-cb51ad74-00e1-4567-9746-3dc9a7888ecb-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out > 25/03/26 15:43:56 INFO BaseAllocator: Debug mode disabled. Enable with the VM > option -Darrow.memory.debug.allocator=true. > 25/03/26 15:43:56 INFO DefaultAllocationManagerOption: allocation manager > type not specified, using netty as the default type > 25/03/26 15:43:56 INFO CheckAllocator: Using DefaultAllocationManager at > memory/netty/DefaultAllocationManagerFactory.class > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 4.1.0-SNAPSHOT > /_/ > Type in expressions to have them evaluated. > Spark connect server version 4.1.0-SNAPSHOT. > Spark session available as 'spark'. > > scala> exit > Bye! > 25/03/26 15:44:00 INFO ShutdownHookManager: Shutdown hook called > 25/03/26 15:44:00 INFO ShutdownHookManager: Deleting directory > /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-ad8dfdf4-cf2b-413f-a9e3-d6e310dff1ea > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 25/03/26 15:44:04 INFO SparkSession: Spark Connect server started with the > log file: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-a7b9a1dc-1e16-4e0e-b7c1-8f957d730df3-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out > 25/03/26 15:44:05 INFO BaseAllocator: Debug mode disabled. Enable with the VM > option -Darrow.memory.debug.allocator=true. > 25/03/26 15:44:05 INFO DefaultAllocationManagerOption: allocation manager > type not specified, using netty as the default type > 25/03/26 15:44:05 INFO CheckAllocator: Using DefaultAllocationManager at > memory/netty/DefaultAllocationManagerFactory.class > Exception in thread "main" org.apache.spark.SparkException: > org.sparkproject.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Invalid > authentication token > at > org.apache.spark.sql.connect.client.GrpcExceptionConverter.toThrowable(GrpcExceptionConverter.scala:162) > at > org.apache.spark.sql.connect.client.GrpcExceptionConverter.convert(GrpcExceptionConverter.scala:61) > at > org.apache.spark.sql.connect.client.CustomSparkConnectBlockingStub.analyzePlan(CustomSparkConnectBlockingStub.scala:75) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:110) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:256) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:227) > at > org.apache.spark.sql.connect.SparkSession.version$lzycompute(SparkSession.scala:92) > at > org.apache.spark.sql.connect.SparkSession.version(SparkSession.scala:91) > at > org.apache.spark.sql.application.ConnectRepl$$anon$1.(ConnectRepl.scala:106) > at > org.apache.spark.sql.application.ConnectRepl$.$anonfun$doMain$1(ConnectRepl.scala:105) > at > org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:824) > at > org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) > at > org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) > at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:569) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala
[jira] [Updated] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.
[ https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-51605: - Affects Version/s: 4.0.0 > If the `logs` directory does not exist, the first launch of `bin/spark-shell > --remote local` will fail. > --- > > Key: SPARK-51605 > URL: https://issues.apache.org/jira/browse/SPARK-51605 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Exception in thread "main" java.nio.file.NoSuchFileException: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) > at java.base/java.nio.file.Files.readAttributes(Files.java:1851) > at > java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:569) > at > java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150) > at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885) > at java.base/java.nio.file.Path.register(Path.java:894) > at > org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791) > at scala.Option.foreach(Option.scala:437) > at > org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791) > at > org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) > at > org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) > at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:569) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called > 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory > /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-51606) After exiting the remote local connect shell, the SparkConnectServer will not terminate.
[ https://issues.apache.org/jira/browse/SPARK-51606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938500#comment-17938500 ] Hyukjin Kwon commented on SPARK-51606: -- thanks. Will take a look. > After exiting the remote local connect shell, the SparkConnectServer will not > terminate. > > > Key: SPARK-51606 > URL: https://issues.apache.org/jira/browse/SPARK-51606 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > > {code:java} > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 25/03/26 15:43:55 INFO SparkSession: Spark Connect server started with the > log file: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-cb51ad74-00e1-4567-9746-3dc9a7888ecb-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out > 25/03/26 15:43:56 INFO BaseAllocator: Debug mode disabled. Enable with the VM > option -Darrow.memory.debug.allocator=true. > 25/03/26 15:43:56 INFO DefaultAllocationManagerOption: allocation manager > type not specified, using netty as the default type > 25/03/26 15:43:56 INFO CheckAllocator: Using DefaultAllocationManager at > memory/netty/DefaultAllocationManagerFactory.class > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 4.1.0-SNAPSHOT > /_/ > Type in expressions to have them evaluated. > Spark connect server version 4.1.0-SNAPSHOT. > Spark session available as 'spark'. > > scala> exit > Bye! > 25/03/26 15:44:00 INFO ShutdownHookManager: Shutdown hook called > 25/03/26 15:44:00 INFO ShutdownHookManager: Deleting directory > /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-ad8dfdf4-cf2b-413f-a9e3-d6e310dff1ea > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Using Spark's default log4j profile: > org/apache/spark/log4j2-defaults.properties > 25/03/26 15:44:04 INFO SparkSession: Spark Connect server started with the > log file: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-a7b9a1dc-1e16-4e0e-b7c1-8f957d730df3-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out > 25/03/26 15:44:05 INFO BaseAllocator: Debug mode disabled. Enable with the VM > option -Darrow.memory.debug.allocator=true. > 25/03/26 15:44:05 INFO DefaultAllocationManagerOption: allocation manager > type not specified, using netty as the default type > 25/03/26 15:44:05 INFO CheckAllocator: Using DefaultAllocationManager at > memory/netty/DefaultAllocationManagerFactory.class > Exception in thread "main" org.apache.spark.SparkException: > org.sparkproject.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Invalid > authentication token > at > org.apache.spark.sql.connect.client.GrpcExceptionConverter.toThrowable(GrpcExceptionConverter.scala:162) > at > org.apache.spark.sql.connect.client.GrpcExceptionConverter.convert(GrpcExceptionConverter.scala:61) > at > org.apache.spark.sql.connect.client.CustomSparkConnectBlockingStub.analyzePlan(CustomSparkConnectBlockingStub.scala:75) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:110) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:256) > at > org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:227) > at > org.apache.spark.sql.connect.SparkSession.version$lzycompute(SparkSession.scala:92) > at > org.apache.spark.sql.connect.SparkSession.version(SparkSession.scala:91) > at > org.apache.spark.sql.application.ConnectRepl$$anon$1.(ConnectRepl.scala:106) > at > org.apache.spark.sql.application.ConnectRepl$.$anonfun$doMain$1(ConnectRepl.scala:105) > at > org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:824) > at > org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) > at > org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) > at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:569) > at > org.apach
[jira] [Created] (SPARK-51604) split test_connect_session
Ruifeng Zheng created SPARK-51604: - Summary: split test_connect_session Key: SPARK-51604 URL: https://issues.apache.org/jira/browse/SPARK-51604 Project: Spark Issue Type: Improvement Components: PySpark, Tests Affects Versions: 4.1 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50873) An optimization for SparkOptimizer to prune the column after RewriteSubquery
[ https://issues.apache.org/jira/browse/SPARK-50873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KeKe Zhu updated SPARK-50873: - Summary: An optimization for SparkOptimizer to prune the column after RewriteSubquery (was: An optimization for SparkOptimizer to prune the column in subquery) > An optimization for SparkOptimizer to prune the column after RewriteSubquery > - > > Key: SPARK-50873 > URL: https://issues.apache.org/jira/browse/SPARK-50873 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.3 >Reporter: KeKe Zhu >Priority: Major > Attachments: query16-opt.PNG, query16-org.PNG > > > I used Spark 3.5+iceberg 1.6.1 to run TPCDS test. When doing performance > analysis, I found that there is a potential optimization for SparkOptimizer. > The optimiztion is about column pruning of DatasourceV2 (DSV2). > In SparkOptimizer, the column pruning of DSV2 is executed in > V2ScanRelationPushDown rule. However, there is a series of optimiztion rules > after V2ScanRelationPushDown, those optimization rule may rewrite subquery > and generate Project or Filter operator that can be used for column pruning, > but column pruning will not be execute again, resulting in the generated > physical plan reading the entire table instead of only reading the required > columns. > For example,there is the query 16 in TPCDS: > {code:java} > set spark.queryID=query16.tpl; > select > count(distinct cs_order_number) as `order count` > ,sum(cs_ext_ship_cost) as `total shipping cost` > ,sum(cs_net_profit) as `total net profit` > from > catalog_sales cs1 > ,date_dim > ,customer_address > ,call_center > where > d_date between '2002-2-01' and > (cast('2002-2-01' as date) + interval 60 days) > and cs1.cs_ship_date_sk = d_date_sk > and cs1.cs_ship_addr_sk = ca_address_sk > and ca_state = 'KS' > and cs1.cs_call_center_sk = cc_call_center_sk > and cc_county in ('Daviess County','Barrow County','Walker County','San > Miguel County', > 'Mobile County' > ) > and exists (select * > from catalog_sales cs2 > where cs1.cs_order_number = cs2.cs_order_number > and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) > and not exists(select * > from catalog_returns cr1 > where cs1.cs_order_number = cr1.cr_order_number) > order by count(distinct cs_order_number) > limit 100; {code} > The final Optimized Plan of the query is as below picture, we can see that > there are two talbes (catalog_sale & catalog_returns) are readed all data and > do project,which certainly cause low performance for iceberg. > !query16-org.PNG! > > > My current solution: I write an optimiztion rule and add it to the > SparkOptimizer, the rule will check again whether the table need to be prune > column and do it if it does, otherwise, no action will be taken. Now i get > the expect optimized plan and get a much better performance result. > !query16-opt.PNG! > I want to know is there any other solution for this problem? contact me > anytime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51604) split test_connect_session
[ https://issues.apache.org/jira/browse/SPARK-51604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51604: --- Labels: pull-request-available (was: ) > split test_connect_session > -- > > Key: SPARK-51604 > URL: https://issues.apache.org/jira/browse/SPARK-51604 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.1 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51350) Implement Show Procedures
[ https://issues.apache.org/jira/browse/SPARK-51350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-51350. - Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50109 [https://github.com/apache/spark/pull/50109] > Implement Show Procedures > - > > Key: SPARK-51350 > URL: https://issues.apache.org/jira/browse/SPARK-51350 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > As part of https://issues.apache.org/jira/browse/SPARK-44167 , implement Show > Procedures to show all stored procedures in the given catalog. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50873) An optimization for SparkOptimizer to prune the column after RewriteSubquery
[ https://issues.apache.org/jira/browse/SPARK-50873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KeKe Zhu updated SPARK-50873: - Summary: An optimization for SparkOptimizer to prune the column after RewriteSubquery (was: An optimization for SparkOptimizer to prune the column after RewriteSubquery) > An optimization for SparkOptimizer to prune the column after RewriteSubquery > > > Key: SPARK-50873 > URL: https://issues.apache.org/jira/browse/SPARK-50873 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.3 >Reporter: KeKe Zhu >Priority: Major > Attachments: query16-opt.PNG, query16-org.PNG > > > I used Spark 3.5+iceberg 1.6.1 to run TPCDS test. When doing performance > analysis, I found that there is a potential optimization for SparkOptimizer. > The optimiztion is about column pruning of DatasourceV2 (DSV2). > In SparkOptimizer, the column pruning of DSV2 is executed in > V2ScanRelationPushDown rule. However, there is a series of optimiztion rules > after V2ScanRelationPushDown, those optimization rule may rewrite subquery > and generate Project or Filter operator that can be used for column pruning, > but column pruning will not be execute again, resulting in the generated > physical plan reading the entire table instead of only reading the required > columns. > For example,there is the query 16 in TPCDS: > {code:java} > set spark.queryID=query16.tpl; > select > count(distinct cs_order_number) as `order count` > ,sum(cs_ext_ship_cost) as `total shipping cost` > ,sum(cs_net_profit) as `total net profit` > from > catalog_sales cs1 > ,date_dim > ,customer_address > ,call_center > where > d_date between '2002-2-01' and > (cast('2002-2-01' as date) + interval 60 days) > and cs1.cs_ship_date_sk = d_date_sk > and cs1.cs_ship_addr_sk = ca_address_sk > and ca_state = 'KS' > and cs1.cs_call_center_sk = cc_call_center_sk > and cc_county in ('Daviess County','Barrow County','Walker County','San > Miguel County', > 'Mobile County' > ) > and exists (select * > from catalog_sales cs2 > where cs1.cs_order_number = cs2.cs_order_number > and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) > and not exists(select * > from catalog_returns cr1 > where cs1.cs_order_number = cr1.cr_order_number) > order by count(distinct cs_order_number) > limit 100; {code} > The final Optimized Plan of the query is as below picture, we can see that > there are two talbes (catalog_sale & catalog_returns) are readed all data and > do project,which certainly cause low performance for iceberg. > !query16-org.PNG! > > > My current solution: I write an optimiztion rule and add it to the > SparkOptimizer, the rule will check again whether the table need to be prune > column and do it if it does, otherwise, no action will be taken. Now i get > the expect optimized plan and get a much better performance result. > !query16-opt.PNG! > I want to know is there any other solution for this problem? contact me > anytime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50873) An optimization for SparkOptimizer to prune column after RewriteSubquery
[ https://issues.apache.org/jira/browse/SPARK-50873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KeKe Zhu updated SPARK-50873: - Summary: An optimization for SparkOptimizer to prune column after RewriteSubquery (was: An optimization for SparkOptimizer to prune the column after RewriteSubquery) > An optimization for SparkOptimizer to prune column after RewriteSubquery > > > Key: SPARK-50873 > URL: https://issues.apache.org/jira/browse/SPARK-50873 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.3 >Reporter: KeKe Zhu >Priority: Major > Attachments: query16-opt.PNG, query16-org.PNG > > > I used Spark 3.5+iceberg 1.6.1 to run TPCDS test. When doing performance > analysis, I found that there is a potential optimization for SparkOptimizer. > The optimiztion is about column pruning of DatasourceV2 (DSV2). > In SparkOptimizer, the column pruning of DSV2 is executed in > V2ScanRelationPushDown rule. However, there is a series of optimiztion rules > after V2ScanRelationPushDown, those optimization rule may rewrite subquery > and generate Project or Filter operator that can be used for column pruning, > but column pruning will not be execute again, resulting in the generated > physical plan reading the entire table instead of only reading the required > columns. > For example,there is the query 16 in TPCDS: > {code:java} > set spark.queryID=query16.tpl; > select > count(distinct cs_order_number) as `order count` > ,sum(cs_ext_ship_cost) as `total shipping cost` > ,sum(cs_net_profit) as `total net profit` > from > catalog_sales cs1 > ,date_dim > ,customer_address > ,call_center > where > d_date between '2002-2-01' and > (cast('2002-2-01' as date) + interval 60 days) > and cs1.cs_ship_date_sk = d_date_sk > and cs1.cs_ship_addr_sk = ca_address_sk > and ca_state = 'KS' > and cs1.cs_call_center_sk = cc_call_center_sk > and cc_county in ('Daviess County','Barrow County','Walker County','San > Miguel County', > 'Mobile County' > ) > and exists (select * > from catalog_sales cs2 > where cs1.cs_order_number = cs2.cs_order_number > and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk) > and not exists(select * > from catalog_returns cr1 > where cs1.cs_order_number = cr1.cr_order_number) > order by count(distinct cs_order_number) > limit 100; {code} > The final Optimized Plan of the query is as below picture, we can see that > there are two talbes (catalog_sale & catalog_returns) are readed all data and > do project,which certainly cause low performance for iceberg. > !query16-org.PNG! > > > My current solution: I write an optimiztion rule and add it to the > SparkOptimizer, the rule will check again whether the table need to be prune > column and do it if it does, otherwise, no action will be taken. Now i get > the expect optimized plan and get a much better performance result. > !query16-opt.PNG! > I want to know is there any other solution for this problem? contact me > anytime. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.
Yang Jie created SPARK-51605: Summary: If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail. Key: SPARK-51605 URL: https://issues.apache.org/jira/browse/SPARK-51605 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.1.0 Reporter: Yang Jie {code:java} bin/spark-shell --remote local WARNING: Using incubator modules: jdk.incubator.vector Exception in thread "main" java.nio.file.NoSuchFileException: /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) at java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) at java.base/java.nio.file.Files.readAttributes(Files.java:1851) at java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173) at java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154) at java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151) at java.base/java.security.AccessController.doPrivileged(AccessController.java:569) at java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150) at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885) at java.base/java.nio.file.Path.register(Path.java:894) at org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717) at org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798) at org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791) at scala.Option.foreach(Option.scala:437) at org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791) at org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) at org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538 {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51606) After exiting the remote local connect shell, the SparkConnectServer will not terminate.
Yang Jie created SPARK-51606: Summary: After exiting the remote local connect shell, the SparkConnectServer will not terminate. Key: SPARK-51606 URL: https://issues.apache.org/jira/browse/SPARK-51606 Project: Spark Issue Type: Bug Components: Connect Affects Versions: 4.1.0 Reporter: Yang Jie {code:java} bin/spark-shell --remote local WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 25/03/26 15:43:55 INFO SparkSession: Spark Connect server started with the log file: /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-cb51ad74-00e1-4567-9746-3dc9a7888ecb-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out 25/03/26 15:43:56 INFO BaseAllocator: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true. 25/03/26 15:43:56 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type 25/03/26 15:43:56 INFO CheckAllocator: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 4.1.0-SNAPSHOT /_/ Type in expressions to have them evaluated. Spark connect server version 4.1.0-SNAPSHOT. Spark session available as 'spark'. scala> exit Bye! 25/03/26 15:44:00 INFO ShutdownHookManager: Shutdown hook called 25/03/26 15:44:00 INFO ShutdownHookManager: Deleting directory /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-ad8dfdf4-cf2b-413f-a9e3-d6e310dff1ea bin/spark-shell --remote local WARNING: Using incubator modules: jdk.incubator.vector Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties 25/03/26 15:44:04 INFO SparkSession: Spark Connect server started with the log file: /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-a7b9a1dc-1e16-4e0e-b7c1-8f957d730df3-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out 25/03/26 15:44:05 INFO BaseAllocator: Debug mode disabled. Enable with the VM option -Darrow.memory.debug.allocator=true. 25/03/26 15:44:05 INFO DefaultAllocationManagerOption: allocation manager type not specified, using netty as the default type 25/03/26 15:44:05 INFO CheckAllocator: Using DefaultAllocationManager at memory/netty/DefaultAllocationManagerFactory.class Exception in thread "main" org.apache.spark.SparkException: org.sparkproject.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Invalid authentication token at org.apache.spark.sql.connect.client.GrpcExceptionConverter.toThrowable(GrpcExceptionConverter.scala:162) at org.apache.spark.sql.connect.client.GrpcExceptionConverter.convert(GrpcExceptionConverter.scala:61) at org.apache.spark.sql.connect.client.CustomSparkConnectBlockingStub.analyzePlan(CustomSparkConnectBlockingStub.scala:75) at org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:110) at org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:256) at org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:227) at org.apache.spark.sql.connect.SparkSession.version$lzycompute(SparkSession.scala:92) at org.apache.spark.sql.connect.SparkSession.version(SparkSession.scala:91) at org.apache.spark.sql.application.ConnectRepl$$anon$1.(ConnectRepl.scala:106) at org.apache.spark.sql.application.ConnectRepl$.$anonfun$doMain$1(ConnectRepl.scala:105) at org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:824) at org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) at org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) at org.apache.spark.deploy.
[jira] [Resolved] (SPARK-51604) split test_connect_session
[ https://issues.apache.org/jira/browse/SPARK-51604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-51604. --- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50397 [https://github.com/apache/spark/pull/50397] > split test_connect_session > -- > > Key: SPARK-51604 > URL: https://issues.apache.org/jira/browse/SPARK-51604 > Project: Spark > Issue Type: Improvement > Components: PySpark, Tests >Affects Versions: 4.1 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51607) the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules
[ https://issues.apache.org/jira/browse/SPARK-51607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51607: --- Labels: pull-request-available (was: ) > the configuration for `maven-shade-plugin` should be set to `combine.self = > "override"` In the `connect` modules > > > Key: SPARK-51607 > URL: https://issues.apache.org/jira/browse/SPARK-51607 > Project: Spark > Issue Type: Bug > Components: Build, Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51607) the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules
Yang Jie created SPARK-51607: Summary: the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules Key: SPARK-51607 URL: https://issues.apache.org/jira/browse/SPARK-51607 Project: Spark Issue Type: Bug Components: Build, Connect Affects Versions: 4.0.0, 4.1.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51622) Titling sections on ExecutionPage
Kent Yao created SPARK-51622: Summary: Titling sections on ExecutionPage Key: SPARK-51622 URL: https://issues.apache.org/jira/browse/SPARK-51622 Project: Spark Issue Type: Improvement Components: UI Affects Versions: 4.1.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51622) Titling sections on ExecutionPage
[ https://issues.apache.org/jira/browse/SPARK-51622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51622: --- Labels: pull-request-available (was: ) > Titling sections on ExecutionPage > - > > Key: SPARK-51622 > URL: https://issues.apache.org/jira/browse/SPARK-51622 > Project: Spark > Issue Type: Improvement > Components: UI >Affects Versions: 4.1.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51624) Propagate GetStructField metadata in CreateNamedStruct.dataType
Andy Lam created SPARK-51624: Summary: Propagate GetStructField metadata in CreateNamedStruct.dataType Key: SPARK-51624 URL: https://issues.apache.org/jira/browse/SPARK-51624 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.6 Reporter: Andy Lam This is important because dataType comparisons are important for optimizer rules such as SimplifyCasts, which can cascade down to more expression optimizations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51573) Fix Streaming State Checkpoint v2 checkpointInfo race condition
[ https://issues.apache.org/jira/browse/SPARK-51573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-51573: Assignee: Livia Zhu > Fix Streaming State Checkpoint v2 checkpointInfo race condition > --- > > Key: SPARK-51573 > URL: https://issues.apache.org/jira/browse/SPARK-51573 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Livia Zhu >Assignee: Livia Zhu >Priority: Major > Labels: pull-request-available > > If two tasks are competing for the same RocksDB state store provider, they > could run into the following race condition: > > ||task 1||task 2|| > |load() - load version 0| | > |commit() - committed version 1| | > | |load() - load version 1| > | |commit() - committed version 2| > |getStateStoreCheckpointInfo - get checkpoint info for version 2 :(| | > We need to ensure that checkpoint info is retrieved atomically with the > commit() before the RocksDB instance lock is released. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51621) Support `sparkSession` for `DataFrame`
[ https://issues.apache.org/jira/browse/SPARK-51621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51621: - Assignee: Dongjoon Hyun > Support `sparkSession` for `DataFrame` > -- > > Key: SPARK-51621 > URL: https://issues.apache.org/jira/browse/SPARK-51621 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: connect-swift-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51621) Support `sparkSession` for `DataFrame`
[ https://issues.apache.org/jira/browse/SPARK-51621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51621. --- Fix Version/s: connect-swift-0.1.0 Resolution: Fixed Issue resolved by pull request 28 [https://github.com/apache/spark-connect-swift/pull/28] > Support `sparkSession` for `DataFrame` > -- > > Key: SPARK-51621 > URL: https://issues.apache.org/jira/browse/SPARK-51621 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: connect-swift-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: connect-swift-0.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51623) Remove class files in source releases
[ https://issues.apache.org/jira/browse/SPARK-51623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51623: --- Labels: pull-request-available (was: ) > Remove class files in source releases > - > > Key: SPARK-51623 > URL: https://issues.apache.org/jira/browse/SPARK-51623 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51573) Fix Streaming State Checkpoint v2 checkpointInfo race condition
[ https://issues.apache.org/jira/browse/SPARK-51573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-51573. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 50344 [https://github.com/apache/spark/pull/50344] > Fix Streaming State Checkpoint v2 checkpointInfo race condition > --- > > Key: SPARK-51573 > URL: https://issues.apache.org/jira/browse/SPARK-51573 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Livia Zhu >Assignee: Livia Zhu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > If two tasks are competing for the same RocksDB state store provider, they > could run into the following race condition: > > ||task 1||task 2|| > |load() - load version 0| | > |commit() - committed version 1| | > | |load() - load version 1| > | |commit() - committed version 2| > |getStateStoreCheckpointInfo - get checkpoint info for version 2 :(| | > We need to ensure that checkpoint info is retrieved atomically with the > commit() before the RocksDB instance lock is released. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.
[ https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-51605. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 50421 [https://github.com/apache/spark/pull/50421] > If the `logs` directory does not exist, the first launch of `bin/spark-shell > --remote local` will fail. > --- > > Key: SPARK-51605 > URL: https://issues.apache.org/jira/browse/SPARK-51605 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code:java} > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Exception in thread "main" java.nio.file.NoSuchFileException: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) > at java.base/java.nio.file.Files.readAttributes(Files.java:1851) > at > java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:569) > at > java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150) > at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885) > at java.base/java.nio.file.Path.register(Path.java:894) > at > org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791) > at scala.Option.foreach(Option.scala:437) > at > org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791) > at > org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) > at > org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) > at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:569) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called > 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory > /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51618) Add a check for jars in CI
[ https://issues.apache.org/jira/browse/SPARK-51618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-51618. -- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50416 [https://github.com/apache/spark/pull/50416] > Add a check for jars in CI > -- > > Key: SPARK-51618 > URL: https://issues.apache.org/jira/browse/SPARK-51618 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > We should disallow jars being added in the source -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51618) Add a check for jars in CI
[ https://issues.apache.org/jira/browse/SPARK-51618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-51618: Assignee: Hyukjin Kwon > Add a check for jars in CI > -- > > Key: SPARK-51618 > URL: https://issues.apache.org/jira/browse/SPARK-51618 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > We should disallow jars being added in the source -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32166) Metastore problem on Spark3.0 with Hive3.0
[ https://issues.apache.org/jira/browse/SPARK-32166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938784#comment-17938784 ] jiacai Guo commented on SPARK-32166: i just have this problem,and find 2 solution # set spark.sql.legacy.createHiveTableByDefault to false; # set hive.metadata.dml.events to false in hivesite.xml or start kyuubi with '–hiveconf hive.metadata.dml.events=false; could you please tell me why it works and is this a bug? [~kevinshin] > Metastore problem on Spark3.0 with Hive3.0 > --- > > Key: SPARK-32166 > URL: https://issues.apache.org/jira/browse/SPARK-32166 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: hzk >Priority: Major > > When i use spark-sql to create table ,the problem appear. > {code:java} > create table bigbig as select b.user_id , b.name , b.age , c.address , c.city > , a.position , a.object , a.problem , a.complaint_time from ( select user_id > , position , object , problem , complaint_time from > HIVE_COMBINE_7efde4e2dcb34c218b3fb08872e698d5 ) as a left join > HIVE_ODS_17_TEST_DEMO_ODS_USERS_INFO_20200608141945 as b on b.user_id = > a.user_id left join HIVE_ODS_17_TEST_ADDRESS_CITY_20200608141942 as c on > c.address_id = b.address_id; > {code} > It opened a connection to hive metastore. > my hive version is 3.1.0. > {code:java} > org.apache.thrift.TApplicationException: Required field 'filesAdded' is > unset! > Struct:InsertEventRequestData(filesAdded:null)org.apache.thrift.TApplicationException: > Required field 'filesAdded' is unset! > Struct:InsertEventRequestData(filesAdded:null) at > org.apache.thrift.TApplicationException.read(TApplicationException.java:111) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4182) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4169) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:1954) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156) > at com.sun.proxy.$Proxy5.fireListenerEvent(Unknown Source) at > org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:1947) at > org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1673) at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:847) at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:757) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:757) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:757) > at > org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255) > at > org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:756) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:829) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:827) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:827) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) > at > org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:827) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:416) > at > org.apache.spark.sql.execution.command.LoadDataCommand.run(tables.scala:403) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:7
[jira] [Updated] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.
[ https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51605: --- Labels: pull-request-available (was: ) > If the `logs` directory does not exist, the first launch of `bin/spark-shell > --remote local` will fail. > --- > > Key: SPARK-51605 > URL: https://issues.apache.org/jira/browse/SPARK-51605 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > {code:java} > bin/spark-shell --remote local > WARNING: Using incubator modules: jdk.incubator.vector > Exception in thread "main" java.nio.file.NoSuchFileException: > /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs > at > java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106) > at > java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) > at > java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) > at > java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148) > at java.base/java.nio.file.Files.readAttributes(Files.java:1851) > at > java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154) > at > java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151) > at > java.base/java.security.AccessController.doPrivileged(AccessController.java:569) > at > java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150) > at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885) > at java.base/java.nio.file.Path.register(Path.java:894) > at > org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798) > at > org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791) > at scala.Option.foreach(Option.scala:437) > at > org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791) > at > org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67) > at > org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57) > at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:569) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called > 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory > /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51625) command in CTE relations should trigger inline
Wenchen Fan created SPARK-51625: --- Summary: command in CTE relations should trigger inline Key: SPARK-51625 URL: https://issues.apache.org/jira/browse/SPARK-51625 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51624) Propagate GetStructField metadata in CreateNamedStruct.dataType
[ https://issues.apache.org/jira/browse/SPARK-51624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51624: --- Labels: pull-request-available (was: ) > Propagate GetStructField metadata in CreateNamedStruct.dataType > --- > > Key: SPARK-51624 > URL: https://issues.apache.org/jira/browse/SPARK-51624 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.6 >Reporter: Andy Lam >Priority: Major > Labels: pull-request-available > > This is important because dataType comparisons are important for optimizer > rules such as SimplifyCasts, which can cascade down to more expression > optimizations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51623) Remove class files in source releases
Hyukjin Kwon created SPARK-51623: Summary: Remove class files in source releases Key: SPARK-51623 URL: https://issues.apache.org/jira/browse/SPARK-51623 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51626) Support `DataFrameReader`
Dongjoon Hyun created SPARK-51626: - Summary: Support `DataFrameReader` Key: SPARK-51626 URL: https://issues.apache.org/jira/browse/SPARK-51626 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: connect-swift-0.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51620) Support `columns` for `DataFrame`
[ https://issues.apache.org/jira/browse/SPARK-51620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51620. --- Fix Version/s: connect-swift-0.1.0 Resolution: Fixed Issue resolved by pull request 27 [https://github.com/apache/spark-connect-swift/pull/27] > Support `columns` for `DataFrame` > - > > Key: SPARK-51620 > URL: https://issues.apache.org/jira/browse/SPARK-51620 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: connect-swift-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: connect-swift-0.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51620) Support `columns` for `DataFrame`
[ https://issues.apache.org/jira/browse/SPARK-51620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51620: - Assignee: Dongjoon Hyun > Support `columns` for `DataFrame` > - > > Key: SPARK-51620 > URL: https://issues.apache.org/jira/browse/SPARK-51620 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: connect-swift-0.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51622) Titling sections on ExecutionPage
[ https://issues.apache.org/jira/browse/SPARK-51622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-51622. -- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 50424 [https://github.com/apache/spark/pull/50424] > Titling sections on ExecutionPage > - > > Key: SPARK-51622 > URL: https://issues.apache.org/jira/browse/SPARK-51622 > Project: Spark > Issue Type: Improvement > Components: UI >Affects Versions: 4.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51627) Add a schedule workflow for numpy 2.1.3
[ https://issues.apache.org/jira/browse/SPARK-51627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51627: --- Labels: pull-request-available (was: ) > Add a schedule workflow for numpy 2.1.3 > --- > > Key: SPARK-51627 > URL: https://issues.apache.org/jira/browse/SPARK-51627 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.1 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51628) Clean up the assembly module before maven testing in maven daily test
[ https://issues.apache.org/jira/browse/SPARK-51628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-51628: - Summary: Clean up the assembly module before maven testing in maven daily test (was: Clean up the assembly module before maven testing) > Clean up the assembly module before maven testing in maven daily test > - > > Key: SPARK-51628 > URL: https://issues.apache.org/jira/browse/SPARK-51628 > Project: Spark > Issue Type: Test > Components: Project Infra, Tests >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51628) Clean up the assembly module before maven testing
Yang Jie created SPARK-51628: Summary: Clean up the assembly module before maven testing Key: SPARK-51628 URL: https://issues.apache.org/jira/browse/SPARK-51628 Project: Spark Issue Type: Test Components: Project Infra, Tests Affects Versions: 4.0.0, 4.1.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51629) Add a download link on ExecutionPage for svg/dot/txt format plans
Kent Yao created SPARK-51629: Summary: Add a download link on ExecutionPage for svg/dot/txt format plans Key: SPARK-51629 URL: https://issues.apache.org/jira/browse/SPARK-51629 Project: Spark Issue Type: Improvement Components: UI Affects Versions: 4.1.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51622) Titling sections on ExecutionPage
[ https://issues.apache.org/jira/browse/SPARK-51622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-51622: Assignee: Kent Yao > Titling sections on ExecutionPage > - > > Key: SPARK-51622 > URL: https://issues.apache.org/jira/browse/SPARK-51622 > Project: Spark > Issue Type: Improvement > Components: UI >Affects Versions: 4.1.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51628) Clean up the assembly module before maven testing in maven daily test
[ https://issues.apache.org/jira/browse/SPARK-51628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51628: --- Labels: pull-request-available (was: ) > Clean up the assembly module before maven testing in maven daily test > - > > Key: SPARK-51628 > URL: https://issues.apache.org/jira/browse/SPARK-51628 > Project: Spark > Issue Type: Bug > Components: Project Infra, Tests >Affects Versions: 4.0.0, 4.1.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > > Before actually executing `mvn test` in the Maven daily test, a process of > `mvn clean -pl assembly` should be added to prevent the issue described in > SPARK-51600 from being unverifiable in the Maven daily test. Currently, this > action should not be taken when testing the `connect` module, because some > tests in the `connect-client-jvm` module strongly depend on the completion of > the `assembly` module build. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51621) Support `sparkSession` for `DataFrame`
Dongjoon Hyun created SPARK-51621: - Summary: Support `sparkSession` for `DataFrame` Key: SPARK-51621 URL: https://issues.apache.org/jira/browse/SPARK-51621 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: connect-swift-0.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51626) Support `DataFrameReader`
[ https://issues.apache.org/jira/browse/SPARK-51626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51626: --- Labels: pull-request-available (was: ) > Support `DataFrameReader` > - > > Key: SPARK-51626 > URL: https://issues.apache.org/jira/browse/SPARK-51626 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: connect-swift-0.1.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org