date:20250326

[jira] [Created] (SPARK-51608) Better log exception in python udf worker.

2025-03-26 Thread Dmitry (Jira)

Dmitry created SPARK-51608:
--

 Summary: Better log exception in python udf worker.
 Key: SPARK-51608
 URL: https://issues.apache.org/jira/browse/SPARK-51608
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Dmitry


It was possible to see the error in the logs: 

{{24/12/27 20:25:04 WARN PythonUDFWithNamedArgumentsRunner: Failed to stop 
worker}}

However, it does not reaveal what exactely happened. It really makes sense to 
include exception information to the logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51609) Optimize simple queries

2025-03-26 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-51609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavle Martinović updated SPARK-51609:
-
Description: We would like to speed up the execution RecursiveCTEs. It is 
possible to optimize simple queries to run in-memory, leading to large speed 
ups.

> Optimize simple queries
> ---
>
> Key: SPARK-51609
> URL: https://issues.apache.org/jira/browse/SPARK-51609
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.1.0
>Reporter: Pavle Martinović
>Priority: Major
>
> We would like to speed up the execution RecursiveCTEs. It is possible to 
> optimize simple queries to run in-memory, leading to large speed ups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51610) Support the TIME data type in the parquet datasource

2025-03-26 Thread Max Gekk (Jira)

Max Gekk created SPARK-51610:


 Summary: Support the TIME data type in the parquet datasource
 Key: SPARK-51610
 URL: https://issues.apache.org/jira/browse/SPARK-51610
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.1.0
Reporter: Max Gekk
Assignee: Max Gekk


Allow the TIME type in the Parquet datasource which was disabled by 
SPARK-51590. Support TimeType in vectorized and non-vectorized readers as well 
as in the writer.

Write tests for:
- the read path. Create a parquet file using an external library, and read it 
back by Spark SQL.
- the write path: write TIME values to parquet and read it back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51574) Implement filters

2025-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-51574:
---

Assignee: Haoyu Weng

> Implement filters
> -
>
> Key: SPARK-51574
> URL: https://issues.apache.org/jira/browse/SPARK-51574
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.1.0
>Reporter: Haoyu Weng
>Assignee: Haoyu Weng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51496) CaseInsensitiveStringMap comparison should ignore case

2025-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-51496.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 50275
[https://github.com/apache/spark/pull/50275]

> CaseInsensitiveStringMap comparison should ignore case
> --
>
> Key: SPARK-51496
> URL: https://issues.apache.org/jira/browse/SPARK-51496
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When merging commandOptions and dsOptions (both of them are 
> CaseInsensitiveStringMap), we have
> {quote}assert(commandOptions == dsOptions || commandOptions.isEmpty || 
> dsOptions.isEmpty){quote}
> If commandOptions has ("KEY", "value"), and dsOption has ("key", "value"), 
> the assertion fails, but it should pass instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51611) Implement new single-pass Analyzer functionality

2025-03-26 Thread Vladimir Golubev (Jira)

Vladimir Golubev created SPARK-51611:


 Summary: Implement new single-pass Analyzer functionality
 Key: SPARK-51611
 URL: https://issues.apache.org/jira/browse/SPARK-51611
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.1.0
Reporter: Vladimir Golubev


- GROUP BY
- ORDER BY
- JOIN
- Correlated subqueries
- Other small features and bugfixes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45900) Expand hash functionalities from to include XXH3

2025-03-26 Thread Dmitry Kravchuk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938631#comment-17938631
 ] 

Dmitry Kravchuk commented on SPARK-45900:
-

We also need this feature)

> Expand hash functionalities from to include XXH3
> 
>
> Key: SPARK-45900
> URL: https://issues.apache.org/jira/browse/SPARK-45900
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Nathan Holland
>Priority: Major
>
> I often work in projects that require deterministic randomness, especially 
> when creating surrogate keys. For small volumes of data xxhash64 works well 
> however this functionality doesn't scale well - with a 64-bit hash code, the 
> chance of collision is one in a million when you hash just six million items 
> increasing sharply due to the birthday paradox.
> Currently there are a few ways to handle this
>  - hash: 32-bit output (>50% chance of collision at least one for tables 
> larger than 77000 rows)
>  - xxhash64: 64-bit output (>50% chance of collision at least one for tables 
> larger than 5 billion rows)
>  - shaXXX/md5: single binary column input, string output, quite 
> computationally expensive.
> I'd suggest adding the newest algorithm in the xxhash64 family, XXH3. The 
> XXH3 family is a modern 64-bit and 128-bit hash function family that provides 
> improved strength and performance across the board.
> I'd imagine this would be a new function named xxhash3 and take 64 bit, and 
> 128bit bit lengths. For usability I believe the bit length should default to 
> 128bits to provide the best experience to reduce accidental collisions and 
> leave users to set the bit length to 64 as an override if they need to for 
> additional performance or interop reasons. (given the benchmarks, this would 
> likely be quite rare)
> References:
>  * [Documentation|https://xxhash.com/]
>  * xxHash64 Ticket (https://issues.apache.org/jira/browse/SPARK-27099)
>  * [Existing xxHash64 
> logic|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/XXH64.java]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51611) Implement new single-pass Analyzer functionality

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51611:
---
Labels: pull-request-available  (was: )

> Implement new single-pass Analyzer functionality
> 
>
> Key: SPARK-51611
> URL: https://issues.apache.org/jira/browse/SPARK-51611
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> - GROUP BY
> - ORDER BY
> - JOIN
> - Correlated subqueries
> - Other small features and bugfixes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51600) Should prepend `sql/hive`/`sql/hive-thriftserver` when `isTesting || isTestingSql` is true

2025-03-26 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-51600.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 50385
[https://github.com/apache/spark/pull/50385]

> Should prepend `sql/hive`/`sql/hive-thriftserver` when `isTesting || 
> isTestingSql` is true
> --
>
> Key: SPARK-51600
> URL: https://issues.apache.org/jira/browse/SPARK-51600
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51615) Refactor ShowNamespaces to use RunnableCommand

2025-03-26 Thread Szehon Ho (Jira)

Szehon Ho created SPARK-51615:
-

 Summary: Refactor ShowNamespaces to use RunnableCommand
 Key: SPARK-51615
 URL: https://issues.apache.org/jira/browse/SPARK-51615
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: Szehon Ho


RunnableCommand is the latest way to run commands. The advantage over the old 
way is that we have a single node (no need for a logicalPlan and physical exec 
node).

We should refactor 'SHOW NAMESPACES' to use the new approach.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51615) Refactor ShowNamespaces to use RunnableCommand

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51615:
---
Labels: pull-request-available  (was: )

> Refactor ShowNamespaces to use RunnableCommand
> --
>
> Key: SPARK-51615
> URL: https://issues.apache.org/jira/browse/SPARK-51615
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Szehon Ho
>Priority: Minor
>  Labels: pull-request-available
>
> RunnableCommand is the latest way to run commands. The advantage over the old 
> way is that we have a single node (no need for a logicalPlan and physical 
> exec node).
> We should refactor 'SHOW NAMESPACES' to use the new approach.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51614) Fix error message when there is a Generate under an UnresolvedHaving

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51614:
---
Labels: pull-request-available  (was: )

> Fix error message when there is a Generate under an UnresolvedHaving
> 
>
> Key: SPARK-51614
> URL: https://issues.apache.org/jira/browse/SPARK-51614
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Mihailo Aleksic
>Priority: Major
>  Labels: pull-request-available
>
> Generate under an UnresolvedHaving throws internal error and I propose that 
> we fix it so we throw a meaningful error from CheckAnalysis.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51609) Optimize Recursive CTE execution for simple queries

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51609:
---
Labels: pull-request-available  (was: )

> Optimize Recursive CTE execution for simple queries
> ---
>
> Key: SPARK-51609
> URL: https://issues.apache.org/jira/browse/SPARK-51609
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.1.0
>Reporter: Pavle Martinović
>Priority: Major
>  Labels: pull-request-available
>
> We would like to speed up the execution RecursiveCTEs. It is possible to 
> optimize simple queries to run in-memory, leading to large speed ups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51587) [PySpark] Fix an issue where timestamp cannot be used in ListState when multiple state data is involved

2025-03-26 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-51587:
-
Issue Type: Bug  (was: Task)

> [PySpark] Fix an issue where timestamp cannot be used in ListState when 
> multiple state data is involved
> ---
>
> Key: SPARK-51587
> URL: https://issues.apache.org/jira/browse/SPARK-51587
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Bo Gao
>Assignee: Bo Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Fix an issue where timestamp cannot be used in ListState when multiple state 
> data is involved.
> Right now below error will be thrown
> {code:python}
> [UNSUPPORTED_ARROWTYPE] Unsupported arrow type Timestamp(NANOSECOND, null). 
> SQLSTATE: 0A000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-51606) After exiting the remote local connect shell, the SparkConnectServer will not terminate.

2025-03-26 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-51606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938496#comment-17938496
 ] 

Yang Jie commented on SPARK-51606:
--

cc [~gurwls223] 

> After exiting the remote local connect shell, the SparkConnectServer will not 
> terminate.
> 
>
> Key: SPARK-51606
> URL: https://issues.apache.org/jira/browse/SPARK-51606
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 25/03/26 15:43:55 INFO SparkSession: Spark Connect server started with the 
> log file: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-cb51ad74-00e1-4567-9746-3dc9a7888ecb-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out
> 25/03/26 15:43:56 INFO BaseAllocator: Debug mode disabled. Enable with the VM 
> option -Darrow.memory.debug.allocator=true.
> 25/03/26 15:43:56 INFO DefaultAllocationManagerOption: allocation manager 
> type not specified, using netty as the default type
> 25/03/26 15:43:56 INFO CheckAllocator: Using DefaultAllocationManager at 
> memory/netty/DefaultAllocationManagerFactory.class
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 4.1.0-SNAPSHOT
>       /_/
> Type in expressions to have them evaluated.
> Spark connect server version 4.1.0-SNAPSHOT.
> Spark session available as 'spark'.
>    
> scala> exit 
> Bye!
> 25/03/26 15:44:00 INFO ShutdownHookManager: Shutdown hook called
> 25/03/26 15:44:00 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-ad8dfdf4-cf2b-413f-a9e3-d6e310dff1ea
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 25/03/26 15:44:04 INFO SparkSession: Spark Connect server started with the 
> log file: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-a7b9a1dc-1e16-4e0e-b7c1-8f957d730df3-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out
> 25/03/26 15:44:05 INFO BaseAllocator: Debug mode disabled. Enable with the VM 
> option -Darrow.memory.debug.allocator=true.
> 25/03/26 15:44:05 INFO DefaultAllocationManagerOption: allocation manager 
> type not specified, using netty as the default type
> 25/03/26 15:44:05 INFO CheckAllocator: Using DefaultAllocationManager at 
> memory/netty/DefaultAllocationManagerFactory.class
> Exception in thread "main" org.apache.spark.SparkException: 
> org.sparkproject.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Invalid 
> authentication token
>   at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter.toThrowable(GrpcExceptionConverter.scala:162)
>   at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter.convert(GrpcExceptionConverter.scala:61)
>   at 
> org.apache.spark.sql.connect.client.CustomSparkConnectBlockingStub.analyzePlan(CustomSparkConnectBlockingStub.scala:75)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:110)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:256)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:227)
>   at 
> org.apache.spark.sql.connect.SparkSession.version$lzycompute(SparkSession.scala:92)
>   at 
> org.apache.spark.sql.connect.SparkSession.version(SparkSession.scala:91)
>   at 
> org.apache.spark.sql.application.ConnectRepl$$anon$1.(ConnectRepl.scala:106)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.$anonfun$doMain$1(ConnectRepl.scala:105)
>   at 
> org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:824)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
>   at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
>   at 
> org.apache.spark.deploy.Ja

[jira] [Assigned] (SPARK-51607) the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules

2025-03-26 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-51607:


Assignee: Yang Jie

> the configuration for `maven-shade-plugin` should be set to `combine.self = 
> "override"` In the `connect` modules
> 
>
> Key: SPARK-51607
> URL: https://issues.apache.org/jira/browse/SPARK-51607
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51600) Should prepend `sql/hive`/`sql/hive-thriftserver` when `isTesting || isTestingSql` is true

2025-03-26 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-51600:


Assignee: Yang Jie

> Should prepend `sql/hive`/`sql/hive-thriftserver` when `isTesting || 
> isTestingSql` is true
> --
>
> Key: SPARK-51600
> URL: https://issues.apache.org/jira/browse/SPARK-51600
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51614) Fix error message when there is a Generate under an UnresolvedHaving

2025-03-26 Thread Mihailo Aleksic (Jira)

Mihailo Aleksic created SPARK-51614:
---

 Summary: Fix error message when there is a Generate under an 
UnresolvedHaving
 Key: SPARK-51614
 URL: https://issues.apache.org/jira/browse/SPARK-51614
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.1.0
Reporter: Mihailo Aleksic


Generate under an UnresolvedHaving throws internal error and I propose that we 
fix it so we throw a meaningful error from CheckAnalysis.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51577) Make gradle build to automatically append SNAPSHOT suffix to version for non-release builds

2025-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51577.
---
Fix Version/s: kubernetes-operator-0.1.0
   Resolution: Fixed

Issue resolved by pull request 170
[https://github.com/apache/spark-kubernetes-operator/pull/170]

> Make gradle build to automatically append SNAPSHOT suffix to version for 
> non-release builds
> ---
>
> Key: SPARK-51577
> URL: https://issues.apache.org/jira/browse/SPARK-51577
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
> Fix For: kubernetes-operator-0.1.0
>
>
> As we already configured `version` in gradle properties, it makes sense to 
> append `-SNAPSHOT` as needed comparing with hardcoded override 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51577) Make gradle build to automatically append SNAPSHOT suffix to version for non-release builds

2025-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51577:
-

Assignee: Zhou JIANG

> Make gradle build to automatically append SNAPSHOT suffix to version for 
> non-release builds
> ---
>
> Key: SPARK-51577
> URL: https://issues.apache.org/jira/browse/SPARK-51577
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Assignee: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>
> As we already configured `version` in gradle properties, it makes sense to 
> append `-SNAPSHOT` as needed comparing with hardcoded override 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51612) Display view creation confs in Desc As Json

2025-03-26 Thread Amanda Liu (Jira)

Amanda Liu created SPARK-51612:
--

 Summary: Display view creation confs in Desc As Json
 Key: SPARK-51612
 URL: https://issues.apache.org/jira/browse/SPARK-51612
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Amanda Liu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51613) Improve Spark Operator metrics

2025-03-26 Thread Damon Cortesi (Jira)

Damon Cortesi created SPARK-51613:
-

 Summary: Improve Spark Operator metrics
 Key: SPARK-51613
 URL: https://issues.apache.org/jira/browse/SPARK-51613
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: kubernetes-operator-0.1.0
Reporter: Damon Cortesi


Today the Spark Operator provides JVM, Kubernetes, and Java Operator SDK 
metrics, but no metrics specific to the functionality and health of the Spark 
App or Cluster resources managed by the operator. It would be nice to have 
metrics like:
 * Total counts of Apps or Clusters by state (Submitted, Failed, Successful, 
etc)
 * Gauges of Apps or Clusters by state (Submitted, Pending, Running, etc)
 * Timers for Spark submit latency (Submission --> Running for example)
 * Potentially depth of the reconciliation backlog and how many apps are 
getting added per interval, although this may already be handled in the 
operator SDK metrics via reconciliations_queue_size

In addition, it would be nice to have Prometheus metrics with labels, but it 
doesn't look like Dropwizard supports that (nor likely to happen via 
[https://github.com/dropwizard/metrics/issues/1272] ).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51607) the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules

2025-03-26 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-51607.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 50401
[https://github.com/apache/spark/pull/50401]

> the configuration for `maven-shade-plugin` should be set to `combine.self = 
> "override"` In the `connect` modules
> 
>
> Key: SPARK-51607
> URL: https://issues.apache.org/jira/browse/SPARK-51607
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51612) Display Spark confs set at view creation in Desc As Json

2025-03-26 Thread Amanda Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amanda Liu updated SPARK-51612:
---
Summary: Display Spark confs set at view creation in Desc As Json  (was: 
Display view creation confs in Desc As Json)

> Display Spark confs set at view creation in Desc As Json
> 
>
> Key: SPARK-51612
> URL: https://issues.apache.org/jira/browse/SPARK-51612
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Amanda Liu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51610) Support the TIME data type in the parquet datasource

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51610:
---
Labels: pull-request-available  (was: )

> Support the TIME data type in the parquet datasource
> 
>
> Key: SPARK-51610
> URL: https://issues.apache.org/jira/browse/SPARK-51610
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Allow the TIME type in the Parquet datasource which was disabled by 
> SPARK-51590. Support TimeType in vectorized and non-vectorized readers as 
> well as in the writer.
> Write tests for:
> - the read path. Create a parquet file using an external library, and read it 
> back by Spark SQL.
> - the write path: write TIME values to parquet and read it back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51615) Refactor ShowNamespaces to use RunnableCommand

2025-03-26 Thread Szehon Ho (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated SPARK-51615:
--
Issue Type: Improvement  (was: New Feature)

> Refactor ShowNamespaces to use RunnableCommand
> --
>
> Key: SPARK-51615
> URL: https://issues.apache.org/jira/browse/SPARK-51615
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Szehon Ho
>Priority: Minor
>  Labels: pull-request-available
>
> RunnableCommand is the latest way to run commands. The advantage over the old 
> way is that we have a single node (no need for a logicalPlan and physical 
> exec node).
> We should refactor 'SHOW NAMESPACES' to use the new approach.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51612) Display Spark confs set at view creation in Desc As Json

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51612:
---
Labels: pull-request-available  (was: )

> Display Spark confs set at view creation in Desc As Json
> 
>
> Key: SPARK-51612
> URL: https://issues.apache.org/jira/browse/SPARK-51612
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Amanda Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51318) Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-51318.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 50378
[https://github.com/apache/spark/pull/50378]

> Remove `jar` files from Apache Spark repository and disable affected tests
> --
>
> Key: SPARK-51318
> URL: https://issues.apache.org/jira/browse/SPARK-51318
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.1.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This issue aims to remove `jar` file from the following branches and disabled 
> affected tests with IDed TODO comments.
> *MASTER*
> {code}
> $ find . -name '*.jar'
> ./connect-examples/server-library-example/resources/spark-daria_2.13-1.2.3.jar
> ./core/src/test/resources/TestHelloV2_2.13.jar
> ./core/src/test/resources/TestHelloV3_2.13.jar
> ./core/src/test/resources/TestUDTF.jar
> ./data/artifact-tests/junitLargeJar.jar
> ./data/artifact-tests/smallJar.jar
> ./sql/core/src/test/resources/SPARK-33084.jar
> ./sql/core/src/test/resources/artifact-tests/udf_noA.jar
> ./sql/hive/src/test/noclasspath/hive-test-udfs.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar
> ./sql/hive/src/test/resources/SPARK-21101-1.0.jar
> ./sql/hive/src/test/resources/data/files/TestSerDe.jar
> ./sql/hive/src/test/resources/TestUDTF.jar
> ./sql/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar
> ./sql/connect/common/src/test/resources/artifact-tests/smallJar.jar
> ./sql/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar
> ./sql/connect/client/jvm/src/test/resources/udf2.13.jar
> ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar
> {code}
> *branch-4.0*
> {code}
> $ find . -name '*.jar'
> ./connect-examples/server-library-example/resources/spark-daria_2.13-1.2.3.jar
> ./core/src/test/resources/TestHelloV2_2.13.jar
> ./core/src/test/resources/TestHelloV3_2.13.jar
> ./core/src/test/resources/TestUDTF.jar
> ./data/artifact-tests/junitLargeJar.jar
> ./data/artifact-tests/smallJar.jar
> ./sql/core/src/test/resources/SPARK-33084.jar
> ./sql/core/src/test/resources/artifact-tests/udf_noA.jar
> ./sql/hive/src/test/noclasspath/hive-test-udfs.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar
> ./sql/hive/src/test/resources/SPARK-21101-1.0.jar
> ./sql/hive/src/test/resources/data/files/TestSerDe.jar
> ./sql/hive/src/test/resources/TestUDTF.jar
> ./sql/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar
> ./sql/connect/common/src/test/resources/artifact-tests/smallJar.jar
> ./sql/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar
> ./sql/connect/client/jvm/src/test/resources/udf2.13.jar
> ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar
> {code}
> *branch-3.5*
> {code}
> $ find . -name '*.jar'
> ./core/src/test/resources/TestHelloV3_2.12.jar
> ./core/src/test/resources/TestHelloV2_2.12.jar
> ./core/src/test/resources/TestHelloV2_2.13.jar
> ./core/src/test/resources/TestHelloV3_2.13.jar
> ./core/src/test/resources/TestUDTF.jar
> ./connector/connect/server/src/test/resources/udf_noA.jar
> ./connector/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar
> ./connector/connect/common/src/test/resources/artifact-tests/smallJar.jar
> ./connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar
> ./connector/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar
> ./connector/connect/client/jvm/src/test/resources/udf2.13.jar
> ./connector/connect/client/jvm/src/test/resources/udf2.12.jar
> ./data/artifact-tests/junitLargeJar.jar
> ./data/artifact-tests/smallJar.jar
> ./sql/core/src/test/resources/SPARK-33084.jar
> ./sql/hive/src/test/noclasspath/hive-test-udfs.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar
> ./sql/hive/src/test/resources/SPARK-21101-1.0.jar
> ./sql/hive/src/test/resources/data/files/TestSerDe.jar
> ./sql/hive/src/test/resources/TestUDTF.jar
> ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51318) Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-51318:


Assignee: Hyukjin Kwon

> Remove `jar` files from Apache Spark repository and disable affected tests
> --
>
> Key: SPARK-51318
> URL: https://issues.apache.org/jira/browse/SPARK-51318
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 4.1.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> This issue aims to remove `jar` file from the following branches and disabled 
> affected tests with IDed TODO comments.
> *MASTER*
> {code}
> $ find . -name '*.jar'
> ./connect-examples/server-library-example/resources/spark-daria_2.13-1.2.3.jar
> ./core/src/test/resources/TestHelloV2_2.13.jar
> ./core/src/test/resources/TestHelloV3_2.13.jar
> ./core/src/test/resources/TestUDTF.jar
> ./data/artifact-tests/junitLargeJar.jar
> ./data/artifact-tests/smallJar.jar
> ./sql/core/src/test/resources/SPARK-33084.jar
> ./sql/core/src/test/resources/artifact-tests/udf_noA.jar
> ./sql/hive/src/test/noclasspath/hive-test-udfs.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar
> ./sql/hive/src/test/resources/SPARK-21101-1.0.jar
> ./sql/hive/src/test/resources/data/files/TestSerDe.jar
> ./sql/hive/src/test/resources/TestUDTF.jar
> ./sql/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar
> ./sql/connect/common/src/test/resources/artifact-tests/smallJar.jar
> ./sql/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar
> ./sql/connect/client/jvm/src/test/resources/udf2.13.jar
> ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar
> {code}
> *branch-4.0*
> {code}
> $ find . -name '*.jar'
> ./connect-examples/server-library-example/resources/spark-daria_2.13-1.2.3.jar
> ./core/src/test/resources/TestHelloV2_2.13.jar
> ./core/src/test/resources/TestHelloV3_2.13.jar
> ./core/src/test/resources/TestUDTF.jar
> ./data/artifact-tests/junitLargeJar.jar
> ./data/artifact-tests/smallJar.jar
> ./sql/core/src/test/resources/SPARK-33084.jar
> ./sql/core/src/test/resources/artifact-tests/udf_noA.jar
> ./sql/hive/src/test/noclasspath/hive-test-udfs.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar
> ./sql/hive/src/test/resources/SPARK-21101-1.0.jar
> ./sql/hive/src/test/resources/data/files/TestSerDe.jar
> ./sql/hive/src/test/resources/TestUDTF.jar
> ./sql/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar
> ./sql/connect/common/src/test/resources/artifact-tests/smallJar.jar
> ./sql/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar
> ./sql/connect/client/jvm/src/test/resources/udf2.13.jar
> ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar
> {code}
> *branch-3.5*
> {code}
> $ find . -name '*.jar'
> ./core/src/test/resources/TestHelloV3_2.12.jar
> ./core/src/test/resources/TestHelloV2_2.12.jar
> ./core/src/test/resources/TestHelloV2_2.13.jar
> ./core/src/test/resources/TestHelloV3_2.13.jar
> ./core/src/test/resources/TestUDTF.jar
> ./connector/connect/server/src/test/resources/udf_noA.jar
> ./connector/connect/common/src/test/resources/artifact-tests/junitLargeJar.jar
> ./connector/connect/common/src/test/resources/artifact-tests/smallJar.jar
> ./connector/connect/client/jvm/src/test/resources/TestHelloV2_2.12.jar
> ./connector/connect/client/jvm/src/test/resources/TestHelloV2_2.13.jar
> ./connector/connect/client/jvm/src/test/resources/udf2.13.jar
> ./connector/connect/client/jvm/src/test/resources/udf2.12.jar
> ./data/artifact-tests/junitLargeJar.jar
> ./data/artifact-tests/smallJar.jar
> ./sql/core/src/test/resources/SPARK-33084.jar
> ./sql/hive/src/test/noclasspath/hive-test-udfs.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.12.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar
> ./sql/hive/src/test/resources/regression-test-SPARK-8489/test-2.11.jar
> ./sql/hive/src/test/resources/SPARK-21101-1.0.jar
> ./sql/hive/src/test/resources/data/files/TestSerDe.jar
> ./sql/hive/src/test/resources/TestUDTF.jar
> ./sql/hive-thriftserver/src/test/resources/TestUDTF.jar
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51620) Support `columns` for `DataFrame`

2025-03-26 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-51620:
-

 Summary: Support `columns` for `DataFrame`
 Key: SPARK-51620
 URL: https://issues.apache.org/jira/browse/SPARK-51620
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: connect-swift-0.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51620) Support `columns` for `DataFrame`

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51620:
---
Labels: pull-request-available  (was: )

> Support `columns` for `DataFrame`
> -
>
> Key: SPARK-51620
> URL: https://issues.apache.org/jira/browse/SPARK-51620
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: connect-swift-0.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51612) Display Spark confs set at view creation in Desc As Json

2025-03-26 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-51612:
--

Assignee: Amanda Liu

> Display Spark confs set at view creation in Desc As Json
> 
>
> Key: SPARK-51612
> URL: https://issues.apache.org/jira/browse/SPARK-51612
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51612) Display Spark confs set at view creation in Desc As Json

2025-03-26 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-51612.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 50407
[https://github.com/apache/spark/pull/50407]

> Display Spark confs set at view creation in Desc As Json
> 
>
> Key: SPARK-51612
> URL: https://issues.apache.org/jira/browse/SPARK-51612
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51588) Validate default values handling in micro-batch streaming writes

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-51588:


Assignee: Anton Okolnychyi

> Validate default values handling in micro-batch streaming writes
> 
>
> Key: SPARK-51588
> URL: https://issues.apache.org/jira/browse/SPARK-51588
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51588) Validate default values handling in micro-batch streaming writes

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-51588.
--
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 50351
[https://github.com/apache/spark/pull/50351]

> Validate default values handling in micro-batch streaming writes
> 
>
> Key: SPARK-51588
> URL: https://issues.apache.org/jira/browse/SPARK-51588
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51616) Run CollationTypeCasts explicitly before assigning aliases

2025-03-26 Thread Vladimir Golubev (Jira)

Vladimir Golubev created SPARK-51616:


 Summary: Run CollationTypeCasts explicitly before assigning aliases
 Key: SPARK-51616
 URL: https://issues.apache.org/jira/browse/SPARK-51616
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Vladimir Golubev


The solution introduced in https://issues.apache.org/jira/browse/SPARK-51428 
has a pitfall - Aliases are reassigned only cosmetically, the name resolution 
is stll done by old names. This is a better solution for the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51617) Explicitly commit/revert jar removals

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-51617:


Assignee: Hyukjin Kwon

> Explicitly commit/revert jar removals
> -
>
> Key: SPARK-51617
> URL: https://issues.apache.org/jira/browse/SPARK-51617
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Address https://github.com/apache/spark/pull/50378#discussion_r2013712753



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51617) Explicitly commit/revert jar removals

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-51617.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 50415
[https://github.com/apache/spark/pull/50415]

> Explicitly commit/revert jar removals
> -
>
> Key: SPARK-51617
> URL: https://issues.apache.org/jira/browse/SPARK-51617
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Address https://github.com/apache/spark/pull/50378#discussion_r2013712753



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51617) Explicitly commit/revert jar removals

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-51617:
-
Summary: Explicitly commit/revert jar removals  (was: Restore the test jars 
at the end of release process)

> Explicitly commit/revert jar removals
> -
>
> Key: SPARK-51617
> URL: https://issues.apache.org/jira/browse/SPARK-51617
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Address https://github.com/apache/spark/pull/50378#discussion_r2013712753



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51617) Explicitly commit/revert jar removals

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51617:
---
Labels: pull-request-available  (was: )

> Explicitly commit/revert jar removals
> -
>
> Key: SPARK-51617
> URL: https://issues.apache.org/jira/browse/SPARK-51617
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Address https://github.com/apache/spark/pull/50378#discussion_r2013712753



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51618) Add a check for jars in CI

2025-03-26 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-51618:


 Summary: Add a check for jars in CI
 Key: SPARK-51618
 URL: https://issues.apache.org/jira/browse/SPARK-51618
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We should disallow jars being added in the source



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51585) Oracle dialect supports pushdown datetime functions.

2025-03-26 Thread Jiaan Geng (Jira)

Jiaan Geng created SPARK-51585:
--

 Summary: Oracle dialect supports pushdown datetime functions.
 Key: SPARK-51585
 URL: https://issues.apache.org/jira/browse/SPARK-51585
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.1.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51619) Support UDT in Arrow-optimized Python UDF.

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51619:
---
Labels: pull-request-available  (was: )

> Support UDT in Arrow-optimized Python UDF.
> --
>
> Key: SPARK-51619
> URL: https://issues.apache.org/jira/browse/SPARK-51619
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.1.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51544) Add only unique and necessary metadata columns

2025-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-51544.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 50304
[https://github.com/apache/spark/pull/50304]

> Add only unique and necessary metadata columns
> --
>
> Key: SPARK-51544
> URL: https://issues.apache.org/jira/browse/SPARK-51544
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Timotic
>Assignee: Mihailo Timotic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> AddMetadataColumns should add only unique and necessary metadata columns, not 
> the entire child's metadata output 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51617) Restore the test jars at the end of release process

2025-03-26 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-51617:


 Summary: Restore the test jars at the end of release process
 Key: SPARK-51617
 URL: https://issues.apache.org/jira/browse/SPARK-51617
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Address https://github.com/apache/spark/pull/50378#discussion_r2013712753



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.

2025-03-26 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938758#comment-17938758
 ] 

Hyukjin Kwon commented on SPARK-51605:
--

Ack

> If the `logs` directory does not exist, the first launch of `bin/spark-shell 
> --remote local` will fail.
> ---
>
> Key: SPARK-51605
> URL: https://issues.apache.org/jira/browse/SPARK-51605
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Exception in thread "main" java.nio.file.NoSuchFileException: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs
>   at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>   at 
> java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>   at 
> java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
>   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
>   at 
> java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151)
>   at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
>   at 
> java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150)
>   at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885)
>   at java.base/java.nio.file.Path.register(Path.java:894)
>   at 
> org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791)
>   at scala.Option.foreach(Option.scala:437)
>   at 
> org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
>   at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called
> 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51621) Support `sparkSession` for `DataFrame`

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51621:
---
Labels: pull-request-available  (was: )

> Support `sparkSession` for `DataFrame`
> --
>
> Key: SPARK-51621
> URL: https://issues.apache.org/jira/browse/SPARK-51621
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: connect-swift-0.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46860) Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command

2025-03-26 Thread Krzysztof Ruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938513#comment-17938513
 ] 

Krzysztof Ruta commented on SPARK-46860:


I did some research and experiments. I identified two places where URL 
containing credentials is potentially logged - this applies particularly to 
pt.2 above. But as soon as I addressed it I found another... E.g. Spark stores 
its jars location in session properties (spark.jars), what if somebody decides 
to log full spark config for debugging purposes? Or what if somebody logs full 
spark-submit commant (that includes URL) even before spark app is launched?

I don't think this is the way to go, I mean to alter Spark logging in order to 
keep secrets safe. This would give you a false sense of confidence that your 
password would not leak in any case. You cannot be sure that in some scenarios 
(network problems, wrong characters in password, debug level logging etc.) the 
URL would not be logged.

So in my opinion the key here is to secure your logging system independently of 
Spark. Take Apache Airflow of Gitlab CI/CD - either you are explicitly given 
the option to mask your secrets or you must do it manually, try to go this way. 
In any scenario I can think of this is a safer approach.

To test it, just put obviously incorrect credentials (like you mentioned above) 
or correct ones that you can quickly change and search for them in logs. When 
masked, you should never see them.

> Credentials with https url not working for --jars, --files, --archives & 
> --py-files options on spark-submit command
> ---
>
> Key: SPARK-46860
> URL: https://issues.apache.org/jira/browse/SPARK-46860
> Project: Spark
>  Issue Type: Task
>  Components: k8s
>Affects Versions: 3.3.3, 3.5.0, 3.3.4
> Environment: Spark 3.3.3 deployed on K8s 
>Reporter: Vikram Janarthanan
>Priority: Major
>  Labels: pull-request-available
>
> We are trying to run the spark application by pointing the dependent files as 
> well the main pyspark script from secure webserver
> We are looking for solution to pass the dependencies as well as pysaprk 
> script from webserver.
> we have tried deploying the spark application from webserver to k8s cluster 
> without username and password and it worked, but when tried with 
> username/password we are facing "Exception in thread "{*}main" 
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> https://username:passw...@domain.com/application/pysparkjob.py{*}";
> *Working  options on spark-submit:*
> spark-submit ..
> --repositories https://username:passw...@domain.com/repo1/repo
> --jars https://domain.com/jars/runtime.jar \
> --files https://domain.com/files/query.sql \
> --py-files [https://domain.com/pythonlib/pythonlib.zip] \
> https://domain.com/app1/pysparkapp.py
> Note: only repositories option works with username and password
> *Spark-submit using https url with username/password not working:*
> spark-submit ..
> --jars https://username:passw...@domain.com/jars/runtime.jar \
> --files https://username:passw...@domain.com/files/query.sql \
> --py-files 
> https://username:passw...@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip]
>  \
> https://username:passw...@domain.com/app1/pysparkapp.py
>  
> Error :
> 25/01/23 09:19:57 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.io.IOException: Server returned HTTP response 
> code: 401 for URL: 
> https://username:passw...@domain.com/repository/spark-artifacts/pysparkdemo/1.0/pysparkdemo-1.0.tgz
>         at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2000)
>         at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
>         at 
> java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
>         at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:809)
>         at 
> org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264)
>         at 
> org.apache.spark.util.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:233)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>         at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>         at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>         at scala.collection.TraversableLike.map(Tr

[jira] [Comment Edited] (SPARK-46860) Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command

2025-03-26 Thread Krzysztof Ruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938513#comment-17938513
 ] 

Krzysztof Ruta edited comment on SPARK-46860 at 3/26/25 9:37 AM:
-

I did some research and experiments. I identified two places where URL 
containing credentials is potentially logged - this applies particularly to 
pt.2 above. But as soon as I addressed these I found another... E.g. Spark 
stores its jars location in session properties (spark.jars), what if somebody 
decides to log full spark config for debugging purposes? Or what if somebody 
logs full spark-submit commant (that includes URL) even before spark app is 
launched?

I don't think this is the way to go, I mean to alter Spark logging in order to 
keep secrets safe. This would give you a false sense of confidence that your 
password would not leak in any case. You cannot be sure that in some scenarios 
(network problems, wrong characters in password, debug level logging etc.) the 
URL would not be logged.

So in my opinion the key here is to secure your logging system independently of 
Spark. Take Apache Airflow of Gitlab CI/CD - either you are explicitly given 
the option to mask your secrets or you must do it manually, try to go this way. 
In any scenario I can think of this is a safer approach.

To test it, just put obviously incorrect credentials (like you mentioned above) 
or correct ones that you can quickly change and search for them in logs. When 
masked, you should never see them.


was (Author: JIRAUSER309126):
I did some research and experiments. I identified two places where URL 
containing credentials is potentially logged - this applies particularly to 
pt.2 above. But as soon as I addressed it I found another... E.g. Spark stores 
its jars location in session properties (spark.jars), what if somebody decides 
to log full spark config for debugging purposes? Or what if somebody logs full 
spark-submit commant (that includes URL) even before spark app is launched?

I don't think this is the way to go, I mean to alter Spark logging in order to 
keep secrets safe. This would give you a false sense of confidence that your 
password would not leak in any case. You cannot be sure that in some scenarios 
(network problems, wrong characters in password, debug level logging etc.) the 
URL would not be logged.

So in my opinion the key here is to secure your logging system independently of 
Spark. Take Apache Airflow of Gitlab CI/CD - either you are explicitly given 
the option to mask your secrets or you must do it manually, try to go this way. 
In any scenario I can think of this is a safer approach.

To test it, just put obviously incorrect credentials (like you mentioned above) 
or correct ones that you can quickly change and search for them in logs. When 
masked, you should never see them.

> Credentials with https url not working for --jars, --files, --archives & 
> --py-files options on spark-submit command
> ---
>
> Key: SPARK-46860
> URL: https://issues.apache.org/jira/browse/SPARK-46860
> Project: Spark
>  Issue Type: Task
>  Components: k8s
>Affects Versions: 3.3.3, 3.5.0, 3.3.4
> Environment: Spark 3.3.3 deployed on K8s 
>Reporter: Vikram Janarthanan
>Priority: Major
>  Labels: pull-request-available
>
> We are trying to run the spark application by pointing the dependent files as 
> well the main pyspark script from secure webserver
> We are looking for solution to pass the dependencies as well as pysaprk 
> script from webserver.
> we have tried deploying the spark application from webserver to k8s cluster 
> without username and password and it worked, but when tried with 
> username/password we are facing "Exception in thread "{*}main" 
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> https://username:passw...@domain.com/application/pysparkjob.py{*}";
> *Working  options on spark-submit:*
> spark-submit ..
> --repositories https://username:passw...@domain.com/repo1/repo
> --jars https://domain.com/jars/runtime.jar \
> --files https://domain.com/files/query.sql \
> --py-files [https://domain.com/pythonlib/pythonlib.zip] \
> https://domain.com/app1/pysparkapp.py
> Note: only repositories option works with username and password
> *Spark-submit using https url with username/password not working:*
> spark-submit ..
> --jars https://username:passw...@domain.com/jars/runtime.jar \
> --files https://username:passw...@domain.com/files/query.sql \
> --py-files 
> https://username:passw...@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip]
>  \
> https://username:passw...@domain.com/app1/pysparkapp.py
>  
> Error :
> 25/

[jira] [Commented] (SPARK-46860) Credentials with https url not working for --jars, --files, --archives & --py-files options on spark-submit command

2025-03-26 Thread Krzysztof Ruta (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938515#comment-17938515
 ] 

Krzysztof Ruta commented on SPARK-46860:


Anyway I am not sure what's gonna happen with this PR, how long could it take 
for somebody to review and (hopefully) integrate it.

If you wish to play with it or even use it in prod, you could build spark-core 
yourself and then replace spark-core jar with yours. That's what I did. I can 
provide you with some guidance if necessary.

> Credentials with https url not working for --jars, --files, --archives & 
> --py-files options on spark-submit command
> ---
>
> Key: SPARK-46860
> URL: https://issues.apache.org/jira/browse/SPARK-46860
> Project: Spark
>  Issue Type: Task
>  Components: k8s
>Affects Versions: 3.3.3, 3.5.0, 3.3.4
> Environment: Spark 3.3.3 deployed on K8s 
>Reporter: Vikram Janarthanan
>Priority: Major
>  Labels: pull-request-available
>
> We are trying to run the spark application by pointing the dependent files as 
> well the main pyspark script from secure webserver
> We are looking for solution to pass the dependencies as well as pysaprk 
> script from webserver.
> we have tried deploying the spark application from webserver to k8s cluster 
> without username and password and it worked, but when tried with 
> username/password we are facing "Exception in thread "{*}main" 
> java.io.IOException: Server returned HTTP response code: 401 for URL: 
> https://username:passw...@domain.com/application/pysparkjob.py{*}";
> *Working  options on spark-submit:*
> spark-submit ..
> --repositories https://username:passw...@domain.com/repo1/repo
> --jars https://domain.com/jars/runtime.jar \
> --files https://domain.com/files/query.sql \
> --py-files [https://domain.com/pythonlib/pythonlib.zip] \
> https://domain.com/app1/pysparkapp.py
> Note: only repositories option works with username and password
> *Spark-submit using https url with username/password not working:*
> spark-submit ..
> --jars https://username:passw...@domain.com/jars/runtime.jar \
> --files https://username:passw...@domain.com/files/query.sql \
> --py-files 
> https://username:passw...@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip]
>  \
> https://username:passw...@domain.com/app1/pysparkapp.py
>  
> Error :
> 25/01/23 09:19:57 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.io.IOException: Server returned HTTP response 
> code: 401 for URL: 
> https://username:passw...@domain.com/repository/spark-artifacts/pysparkdemo/1.0/pysparkdemo-1.0.tgz
>         at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2000)
>         at 
> java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
>         at 
> java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
>         at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:809)
>         at 
> org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264)
>         at 
> org.apache.spark.util.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:233)
>         at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>         at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>         at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>         at 
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
>         at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>         at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51609) Optimize Recursive CTE execution forsimple queries

2025-03-26 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-51609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavle Martinović updated SPARK-51609:
-
Summary: Optimize Recursive CTE execution forsimple queries  (was: Optimize 
simple queries)

> Optimize Recursive CTE execution forsimple queries
> --
>
> Key: SPARK-51609
> URL: https://issues.apache.org/jira/browse/SPARK-51609
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.1.0
>Reporter: Pavle Martinović
>Priority: Major
>
> We would like to speed up the execution RecursiveCTEs. It is possible to 
> optimize simple queries to run in-memory, leading to large speed ups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51609) Optimize Recursive CTE execution for simple queries

2025-03-26 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-51609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavle Martinović updated SPARK-51609:
-
Summary: Optimize Recursive CTE execution for simple queries  (was: 
Optimize Recursive CTE execution forsimple queries)

> Optimize Recursive CTE execution for simple queries
> ---
>
> Key: SPARK-51609
> URL: https://issues.apache.org/jira/browse/SPARK-51609
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.1.0
>Reporter: Pavle Martinović
>Priority: Major
>
> We would like to speed up the execution RecursiveCTEs. It is possible to 
> optimize simple queries to run in-memory, leading to large speed ups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51574) Implement filters

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51574:
---
Labels: pull-request-available  (was: )

> Implement filters
> -
>
> Key: SPARK-51574
> URL: https://issues.apache.org/jira/browse/SPARK-51574
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.1.0
>Reporter: Haoyu Weng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51574) Implement filters

2025-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-51574.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 50252
[https://github.com/apache/spark/pull/50252]

> Implement filters
> -
>
> Key: SPARK-51574
> URL: https://issues.apache.org/jira/browse/SPARK-51574
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.1.0
>Reporter: Haoyu Weng
>Assignee: Haoyu Weng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51587) [PySpark] Fix an issue where timestamp cannot be used in ListState when multiple state data is involved

2025-03-26 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-51587:
-
Fix Version/s: 4.0.0
   (was: 4.1.0)

> [PySpark] Fix an issue where timestamp cannot be used in ListState when 
> multiple state data is involved
> ---
>
> Key: SPARK-51587
> URL: https://issues.apache.org/jira/browse/SPARK-51587
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Bo Gao
>Assignee: Bo Gao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Fix an issue where timestamp cannot be used in ListState when multiple state 
> data is involved.
> Right now below error will be thrown
> {code:python}
> [UNSUPPORTED_ARROWTYPE] Unsupported arrow type Timestamp(NANOSECOND, null). 
> SQLSTATE: 0A000
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51608) Better log exception in python udf worker.

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51608:
---
Labels: pull-request-available  (was: )

> Better log exception in python udf worker.
> --
>
> Key: SPARK-51608
> URL: https://issues.apache.org/jira/browse/SPARK-51608
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Dmitry
>Priority: Major
>  Labels: pull-request-available
>
> It was possible to see the error in the logs: 
> {{24/12/27 20:25:04 WARN PythonUDFWithNamedArgumentsRunner: Failed to stop 
> worker}}
> However, it does not reaveal what exactely happened. It really makes sense to 
> include exception information to the logs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51609) Optimize simple queries

2025-03-26 Thread Jira

Pavle Martinović created SPARK-51609:


 Summary: Optimize simple queries
 Key: SPARK-51609
 URL: https://issues.apache.org/jira/browse/SPARK-51609
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.1.0
Reporter: Pavle Martinović






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.

2025-03-26 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938497#comment-17938497
 ] 

Yang Jie commented on SPARK-51605:
--

cc [~gurwls223] 

> If the `logs` directory does not exist, the first launch of `bin/spark-shell 
> --remote local` will fail.
> ---
>
> Key: SPARK-51605
> URL: https://issues.apache.org/jira/browse/SPARK-51605
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Exception in thread "main" java.nio.file.NoSuchFileException: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs
>   at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>   at 
> java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>   at 
> java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
>   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
>   at 
> java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151)
>   at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
>   at 
> java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150)
>   at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885)
>   at java.base/java.nio.file.Path.register(Path.java:894)
>   at 
> org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791)
>   at scala.Option.foreach(Option.scala:437)
>   at 
> org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
>   at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called
> 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51606) After exiting the remote local connect shell, the SparkConnectServer will not terminate.

2025-03-26 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-51606:
-
Affects Version/s: 4.0.0

> After exiting the remote local connect shell, the SparkConnectServer will not 
> terminate.
> 
>
> Key: SPARK-51606
> URL: https://issues.apache.org/jira/browse/SPARK-51606
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 25/03/26 15:43:55 INFO SparkSession: Spark Connect server started with the 
> log file: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-cb51ad74-00e1-4567-9746-3dc9a7888ecb-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out
> 25/03/26 15:43:56 INFO BaseAllocator: Debug mode disabled. Enable with the VM 
> option -Darrow.memory.debug.allocator=true.
> 25/03/26 15:43:56 INFO DefaultAllocationManagerOption: allocation manager 
> type not specified, using netty as the default type
> 25/03/26 15:43:56 INFO CheckAllocator: Using DefaultAllocationManager at 
> memory/netty/DefaultAllocationManagerFactory.class
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 4.1.0-SNAPSHOT
>       /_/
> Type in expressions to have them evaluated.
> Spark connect server version 4.1.0-SNAPSHOT.
> Spark session available as 'spark'.
>    
> scala> exit 
> Bye!
> 25/03/26 15:44:00 INFO ShutdownHookManager: Shutdown hook called
> 25/03/26 15:44:00 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-ad8dfdf4-cf2b-413f-a9e3-d6e310dff1ea
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 25/03/26 15:44:04 INFO SparkSession: Spark Connect server started with the 
> log file: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-a7b9a1dc-1e16-4e0e-b7c1-8f957d730df3-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out
> 25/03/26 15:44:05 INFO BaseAllocator: Debug mode disabled. Enable with the VM 
> option -Darrow.memory.debug.allocator=true.
> 25/03/26 15:44:05 INFO DefaultAllocationManagerOption: allocation manager 
> type not specified, using netty as the default type
> 25/03/26 15:44:05 INFO CheckAllocator: Using DefaultAllocationManager at 
> memory/netty/DefaultAllocationManagerFactory.class
> Exception in thread "main" org.apache.spark.SparkException: 
> org.sparkproject.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Invalid 
> authentication token
>   at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter.toThrowable(GrpcExceptionConverter.scala:162)
>   at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter.convert(GrpcExceptionConverter.scala:61)
>   at 
> org.apache.spark.sql.connect.client.CustomSparkConnectBlockingStub.analyzePlan(CustomSparkConnectBlockingStub.scala:75)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:110)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:256)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:227)
>   at 
> org.apache.spark.sql.connect.SparkSession.version$lzycompute(SparkSession.scala:92)
>   at 
> org.apache.spark.sql.connect.SparkSession.version(SparkSession.scala:91)
>   at 
> org.apache.spark.sql.application.ConnectRepl$$anon$1.(ConnectRepl.scala:106)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.$anonfun$doMain$1(ConnectRepl.scala:105)
>   at 
> org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:824)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
>   at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala

[jira] [Updated] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.

2025-03-26 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-51605:
-
Affects Version/s: 4.0.0

> If the `logs` directory does not exist, the first launch of `bin/spark-shell 
> --remote local` will fail.
> ---
>
> Key: SPARK-51605
> URL: https://issues.apache.org/jira/browse/SPARK-51605
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Exception in thread "main" java.nio.file.NoSuchFileException: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs
>   at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>   at 
> java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>   at 
> java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
>   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
>   at 
> java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151)
>   at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
>   at 
> java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150)
>   at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885)
>   at java.base/java.nio.file.Path.register(Path.java:894)
>   at 
> org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791)
>   at scala.Option.foreach(Option.scala:437)
>   at 
> org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
>   at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called
> 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-51606) After exiting the remote local connect shell, the SparkConnectServer will not terminate.

2025-03-26 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-51606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938500#comment-17938500
 ] 

Hyukjin Kwon commented on SPARK-51606:
--

thanks. Will take a look.

> After exiting the remote local connect shell, the SparkConnectServer will not 
> terminate.
> 
>
> Key: SPARK-51606
> URL: https://issues.apache.org/jira/browse/SPARK-51606
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 25/03/26 15:43:55 INFO SparkSession: Spark Connect server started with the 
> log file: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-cb51ad74-00e1-4567-9746-3dc9a7888ecb-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out
> 25/03/26 15:43:56 INFO BaseAllocator: Debug mode disabled. Enable with the VM 
> option -Darrow.memory.debug.allocator=true.
> 25/03/26 15:43:56 INFO DefaultAllocationManagerOption: allocation manager 
> type not specified, using netty as the default type
> 25/03/26 15:43:56 INFO CheckAllocator: Using DefaultAllocationManager at 
> memory/netty/DefaultAllocationManagerFactory.class
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 4.1.0-SNAPSHOT
>       /_/
> Type in expressions to have them evaluated.
> Spark connect server version 4.1.0-SNAPSHOT.
> Spark session available as 'spark'.
>    
> scala> exit 
> Bye!
> 25/03/26 15:44:00 INFO ShutdownHookManager: Shutdown hook called
> 25/03/26 15:44:00 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-ad8dfdf4-cf2b-413f-a9e3-d6e310dff1ea
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Using Spark's default log4j profile: 
> org/apache/spark/log4j2-defaults.properties
> 25/03/26 15:44:04 INFO SparkSession: Spark Connect server started with the 
> log file: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-a7b9a1dc-1e16-4e0e-b7c1-8f957d730df3-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out
> 25/03/26 15:44:05 INFO BaseAllocator: Debug mode disabled. Enable with the VM 
> option -Darrow.memory.debug.allocator=true.
> 25/03/26 15:44:05 INFO DefaultAllocationManagerOption: allocation manager 
> type not specified, using netty as the default type
> 25/03/26 15:44:05 INFO CheckAllocator: Using DefaultAllocationManager at 
> memory/netty/DefaultAllocationManagerFactory.class
> Exception in thread "main" org.apache.spark.SparkException: 
> org.sparkproject.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Invalid 
> authentication token
>   at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter.toThrowable(GrpcExceptionConverter.scala:162)
>   at 
> org.apache.spark.sql.connect.client.GrpcExceptionConverter.convert(GrpcExceptionConverter.scala:61)
>   at 
> org.apache.spark.sql.connect.client.CustomSparkConnectBlockingStub.analyzePlan(CustomSparkConnectBlockingStub.scala:75)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:110)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:256)
>   at 
> org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:227)
>   at 
> org.apache.spark.sql.connect.SparkSession.version$lzycompute(SparkSession.scala:92)
>   at 
> org.apache.spark.sql.connect.SparkSession.version(SparkSession.scala:91)
>   at 
> org.apache.spark.sql.application.ConnectRepl$$anon$1.(ConnectRepl.scala:106)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.$anonfun$doMain$1(ConnectRepl.scala:105)
>   at 
> org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:824)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
>   at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
>   at 
> org.apach

[jira] [Created] (SPARK-51604) split test_connect_session

2025-03-26 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-51604:
-

 Summary: split test_connect_session
 Key: SPARK-51604
 URL: https://issues.apache.org/jira/browse/SPARK-51604
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Tests
Affects Versions: 4.1
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50873) An optimization for SparkOptimizer to prune the column after RewriteSubquery

2025-03-26 Thread KeKe Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KeKe Zhu updated SPARK-50873:
-
Summary: An optimization for SparkOptimizer to prune the column after  
RewriteSubquery  (was: An optimization for SparkOptimizer to prune the column 
in subquery)

> An optimization for SparkOptimizer to prune the column after  RewriteSubquery
> -
>
> Key: SPARK-50873
> URL: https://issues.apache.org/jira/browse/SPARK-50873
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.3
>Reporter: KeKe Zhu
>Priority: Major
> Attachments: query16-opt.PNG, query16-org.PNG
>
>
> I used Spark 3.5+iceberg 1.6.1 to run TPCDS test. When doing performance 
> analysis, I found that there is a potential optimization for SparkOptimizer.
> The optimiztion is about column pruning of DatasourceV2 (DSV2). 
> In SparkOptimizer, the column pruning of DSV2 is executed in 
> V2ScanRelationPushDown rule. However, there is a series of optimiztion rules 
> after V2ScanRelationPushDown, those optimization rule may rewrite subquery 
> and generate Project or Filter operator that can be used for column pruning, 
> but column pruning will not be execute again, resulting in the generated 
> physical plan reading the entire table instead of only reading the required 
> columns.
> For example，there is the query 16 in TPCDS:
> {code:java}
> set spark.queryID=query16.tpl;
> select
>    count(distinct cs_order_number) as `order count`
>   ,sum(cs_ext_ship_cost) as `total shipping cost`
>   ,sum(cs_net_profit) as `total net profit`
> from
>    catalog_sales cs1
>   ,date_dim
>   ,customer_address
>   ,call_center
> where
>     d_date between '2002-2-01' and
>            (cast('2002-2-01' as date) + interval 60 days)
> and cs1.cs_ship_date_sk = d_date_sk
> and cs1.cs_ship_addr_sk = ca_address_sk
> and ca_state = 'KS'
> and cs1.cs_call_center_sk = cc_call_center_sk
> and cc_county in ('Daviess County','Barrow County','Walker County','San 
> Miguel County',
>                   'Mobile County'
> )
> and exists (select *
>             from catalog_sales cs2
>             where cs1.cs_order_number = cs2.cs_order_number
>               and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
> and not exists(select *
>                from catalog_returns cr1
>                where cs1.cs_order_number = cr1.cr_order_number)
> order by count(distinct cs_order_number)
> limit 100; {code}
> The final Optimized Plan of the query is as below picture, we can see that 
> there are two talbes (catalog_sale & catalog_returns) are readed all data and 
> do project，which certainly cause low performance for iceberg. 
> !query16-org.PNG!
>  
>  
> My current solution:  I write an optimiztion rule and add it to the 
> SparkOptimizer, the rule will check again whether the table need to be prune 
> column and do it if it does, otherwise, no action will be taken. Now i get 
> the expect optimized plan and get a much better performance result.
> !query16-opt.PNG!
> I want to know is there any other solution for this problem? contact me 
> anytime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51604) split test_connect_session

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51604:
---
Labels: pull-request-available  (was: )

> split test_connect_session
> --
>
> Key: SPARK-51604
> URL: https://issues.apache.org/jira/browse/SPARK-51604
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.1
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51350) Implement Show Procedures

2025-03-26 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-51350.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 50109
[https://github.com/apache/spark/pull/50109]

> Implement Show Procedures
> -
>
> Key: SPARK-51350
> URL: https://issues.apache.org/jira/browse/SPARK-51350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> As part of https://issues.apache.org/jira/browse/SPARK-44167 , implement Show 
> Procedures to show all stored procedures in the given catalog.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50873) An optimization for SparkOptimizer to prune the column after RewriteSubquery

2025-03-26 Thread KeKe Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KeKe Zhu updated SPARK-50873:
-
Summary: An optimization for SparkOptimizer to prune the column after 
RewriteSubquery  (was: An optimization for SparkOptimizer to prune the column 
after  RewriteSubquery)

> An optimization for SparkOptimizer to prune the column after RewriteSubquery
> 
>
> Key: SPARK-50873
> URL: https://issues.apache.org/jira/browse/SPARK-50873
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.3
>Reporter: KeKe Zhu
>Priority: Major
> Attachments: query16-opt.PNG, query16-org.PNG
>
>
> I used Spark 3.5+iceberg 1.6.1 to run TPCDS test. When doing performance 
> analysis, I found that there is a potential optimization for SparkOptimizer.
> The optimiztion is about column pruning of DatasourceV2 (DSV2). 
> In SparkOptimizer, the column pruning of DSV2 is executed in 
> V2ScanRelationPushDown rule. However, there is a series of optimiztion rules 
> after V2ScanRelationPushDown, those optimization rule may rewrite subquery 
> and generate Project or Filter operator that can be used for column pruning, 
> but column pruning will not be execute again, resulting in the generated 
> physical plan reading the entire table instead of only reading the required 
> columns.
> For example，there is the query 16 in TPCDS:
> {code:java}
> set spark.queryID=query16.tpl;
> select
>    count(distinct cs_order_number) as `order count`
>   ,sum(cs_ext_ship_cost) as `total shipping cost`
>   ,sum(cs_net_profit) as `total net profit`
> from
>    catalog_sales cs1
>   ,date_dim
>   ,customer_address
>   ,call_center
> where
>     d_date between '2002-2-01' and
>            (cast('2002-2-01' as date) + interval 60 days)
> and cs1.cs_ship_date_sk = d_date_sk
> and cs1.cs_ship_addr_sk = ca_address_sk
> and ca_state = 'KS'
> and cs1.cs_call_center_sk = cc_call_center_sk
> and cc_county in ('Daviess County','Barrow County','Walker County','San 
> Miguel County',
>                   'Mobile County'
> )
> and exists (select *
>             from catalog_sales cs2
>             where cs1.cs_order_number = cs2.cs_order_number
>               and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
> and not exists(select *
>                from catalog_returns cr1
>                where cs1.cs_order_number = cr1.cr_order_number)
> order by count(distinct cs_order_number)
> limit 100; {code}
> The final Optimized Plan of the query is as below picture, we can see that 
> there are two talbes (catalog_sale & catalog_returns) are readed all data and 
> do project，which certainly cause low performance for iceberg. 
> !query16-org.PNG!
>  
>  
> My current solution:  I write an optimiztion rule and add it to the 
> SparkOptimizer, the rule will check again whether the table need to be prune 
> column and do it if it does, otherwise, no action will be taken. Now i get 
> the expect optimized plan and get a much better performance result.
> !query16-opt.PNG!
> I want to know is there any other solution for this problem? contact me 
> anytime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50873) An optimization for SparkOptimizer to prune column after RewriteSubquery

2025-03-26 Thread KeKe Zhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KeKe Zhu updated SPARK-50873:
-
Summary: An optimization for SparkOptimizer to prune column after 
RewriteSubquery  (was: An optimization for SparkOptimizer to prune the column 
after RewriteSubquery)

> An optimization for SparkOptimizer to prune column after RewriteSubquery
> 
>
> Key: SPARK-50873
> URL: https://issues.apache.org/jira/browse/SPARK-50873
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.3
>Reporter: KeKe Zhu
>Priority: Major
> Attachments: query16-opt.PNG, query16-org.PNG
>
>
> I used Spark 3.5+iceberg 1.6.1 to run TPCDS test. When doing performance 
> analysis, I found that there is a potential optimization for SparkOptimizer.
> The optimiztion is about column pruning of DatasourceV2 (DSV2). 
> In SparkOptimizer, the column pruning of DSV2 is executed in 
> V2ScanRelationPushDown rule. However, there is a series of optimiztion rules 
> after V2ScanRelationPushDown, those optimization rule may rewrite subquery 
> and generate Project or Filter operator that can be used for column pruning, 
> but column pruning will not be execute again, resulting in the generated 
> physical plan reading the entire table instead of only reading the required 
> columns.
> For example，there is the query 16 in TPCDS:
> {code:java}
> set spark.queryID=query16.tpl;
> select
>    count(distinct cs_order_number) as `order count`
>   ,sum(cs_ext_ship_cost) as `total shipping cost`
>   ,sum(cs_net_profit) as `total net profit`
> from
>    catalog_sales cs1
>   ,date_dim
>   ,customer_address
>   ,call_center
> where
>     d_date between '2002-2-01' and
>            (cast('2002-2-01' as date) + interval 60 days)
> and cs1.cs_ship_date_sk = d_date_sk
> and cs1.cs_ship_addr_sk = ca_address_sk
> and ca_state = 'KS'
> and cs1.cs_call_center_sk = cc_call_center_sk
> and cc_county in ('Daviess County','Barrow County','Walker County','San 
> Miguel County',
>                   'Mobile County'
> )
> and exists (select *
>             from catalog_sales cs2
>             where cs1.cs_order_number = cs2.cs_order_number
>               and cs1.cs_warehouse_sk <> cs2.cs_warehouse_sk)
> and not exists(select *
>                from catalog_returns cr1
>                where cs1.cs_order_number = cr1.cr_order_number)
> order by count(distinct cs_order_number)
> limit 100; {code}
> The final Optimized Plan of the query is as below picture, we can see that 
> there are two talbes (catalog_sale & catalog_returns) are readed all data and 
> do project，which certainly cause low performance for iceberg. 
> !query16-org.PNG!
>  
>  
> My current solution:  I write an optimiztion rule and add it to the 
> SparkOptimizer, the rule will check again whether the table need to be prune 
> column and do it if it does, otherwise, no action will be taken. Now i get 
> the expect optimized plan and get a much better performance result.
> !query16-opt.PNG!
> I want to know is there any other solution for this problem? contact me 
> anytime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.

2025-03-26 Thread Yang Jie (Jira)

Yang Jie created SPARK-51605:


 Summary: If the `logs` directory does not exist, the first launch 
of `bin/spark-shell --remote local` will fail.
 Key: SPARK-51605
 URL: https://issues.apache.org/jira/browse/SPARK-51605
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.1.0
Reporter: Yang Jie


{code:java}
bin/spark-shell --remote local
WARNING: Using incubator modules: jdk.incubator.vector
Exception in thread "main" java.nio.file.NoSuchFileException: 
/Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs
at 
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at 
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at 
java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at 
java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
at 
java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173)
at 
java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154)
at 
java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151)
at 
java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
at 
java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150)
at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885)
at java.base/java.nio.file.Path.register(Path.java:894)
at 
org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717)
at 
org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798)
at 
org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791)
at scala.Option.foreach(Option.scala:437)
at 
org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791)
at 
org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
at 
org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called
25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory 
/private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51606) After exiting the remote local connect shell, the SparkConnectServer will not terminate.

2025-03-26 Thread Yang Jie (Jira)

Yang Jie created SPARK-51606:


 Summary: After exiting the remote local connect shell, the 
SparkConnectServer will not terminate.
 Key: SPARK-51606
 URL: https://issues.apache.org/jira/browse/SPARK-51606
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 4.1.0
Reporter: Yang Jie


{code:java}
bin/spark-shell --remote local
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/03/26 15:43:55 INFO SparkSession: Spark Connect server started with the log 
file: 
/Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-cb51ad74-00e1-4567-9746-3dc9a7888ecb-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out
25/03/26 15:43:56 INFO BaseAllocator: Debug mode disabled. Enable with the VM 
option -Darrow.memory.debug.allocator=true.
25/03/26 15:43:56 INFO DefaultAllocationManagerOption: allocation manager type 
not specified, using netty as the default type
25/03/26 15:43:56 INFO CheckAllocator: Using DefaultAllocationManager at 
memory/netty/DefaultAllocationManagerFactory.class
Welcome to
                    __
     / __/__  ___ _/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.1.0-SNAPSHOT
      /_/


Type in expressions to have them evaluated.
Spark connect server version 4.1.0-SNAPSHOT.
Spark session available as 'spark'.
   
scala> exit 
Bye!
25/03/26 15:44:00 INFO ShutdownHookManager: Shutdown hook called
25/03/26 15:44:00 INFO ShutdownHookManager: Deleting directory 
/private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-ad8dfdf4-cf2b-413f-a9e3-d6e310dff1ea


bin/spark-shell --remote local
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/03/26 15:44:04 INFO SparkSession: Spark Connect server started with the log 
file: 
/Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs/spark-a7b9a1dc-1e16-4e0e-b7c1-8f957d730df3-org.apache.spark.sql.connect.service.SparkConnectServer-1-local.out
25/03/26 15:44:05 INFO BaseAllocator: Debug mode disabled. Enable with the VM 
option -Darrow.memory.debug.allocator=true.
25/03/26 15:44:05 INFO DefaultAllocationManagerOption: allocation manager type 
not specified, using netty as the default type
25/03/26 15:44:05 INFO CheckAllocator: Using DefaultAllocationManager at 
memory/netty/DefaultAllocationManagerFactory.class
Exception in thread "main" org.apache.spark.SparkException: 
org.sparkproject.io.grpc.StatusRuntimeException: UNAUTHENTICATED: Invalid 
authentication token
at 
org.apache.spark.sql.connect.client.GrpcExceptionConverter.toThrowable(GrpcExceptionConverter.scala:162)
at 
org.apache.spark.sql.connect.client.GrpcExceptionConverter.convert(GrpcExceptionConverter.scala:61)
at 
org.apache.spark.sql.connect.client.CustomSparkConnectBlockingStub.analyzePlan(CustomSparkConnectBlockingStub.scala:75)
at 
org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:110)
at 
org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:256)
at 
org.apache.spark.sql.connect.client.SparkConnectClient.analyze(SparkConnectClient.scala:227)
at 
org.apache.spark.sql.connect.SparkSession.version$lzycompute(SparkSession.scala:92)
at 
org.apache.spark.sql.connect.SparkSession.version(SparkSession.scala:91)
at 
org.apache.spark.sql.application.ConnectRepl$$anon$1.(ConnectRepl.scala:106)
at 
org.apache.spark.sql.application.ConnectRepl$.$anonfun$doMain$1(ConnectRepl.scala:105)
at 
org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:824)
at 
org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
at 
org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
at 
org.apache.spark.deploy.

[jira] [Resolved] (SPARK-51604) split test_connect_session

2025-03-26 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-51604.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 50397
[https://github.com/apache/spark/pull/50397]

> split test_connect_session
> --
>
> Key: SPARK-51604
> URL: https://issues.apache.org/jira/browse/SPARK-51604
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.1
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51607) the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51607:
---
Labels: pull-request-available  (was: )

> the configuration for `maven-shade-plugin` should be set to `combine.self = 
> "override"` In the `connect` modules
> 
>
> Key: SPARK-51607
> URL: https://issues.apache.org/jira/browse/SPARK-51607
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51607) the configuration for `maven-shade-plugin` should be set to `combine.self = "override"` In the `connect` modules

2025-03-26 Thread Yang Jie (Jira)

Yang Jie created SPARK-51607:


 Summary: the configuration for `maven-shade-plugin` should be set 
to `combine.self = "override"` In the `connect` modules
 Key: SPARK-51607
 URL: https://issues.apache.org/jira/browse/SPARK-51607
 Project: Spark
  Issue Type: Bug
  Components: Build, Connect
Affects Versions: 4.0.0, 4.1.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51622) Titling sections on ExecutionPage

2025-03-26 Thread Kent Yao (Jira)

Kent Yao created SPARK-51622:


 Summary: Titling sections on ExecutionPage
 Key: SPARK-51622
 URL: https://issues.apache.org/jira/browse/SPARK-51622
 Project: Spark
  Issue Type: Improvement
  Components: UI
Affects Versions: 4.1.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51622) Titling sections on ExecutionPage

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51622:
---
Labels: pull-request-available  (was: )

> Titling sections on ExecutionPage
> -
>
> Key: SPARK-51622
> URL: https://issues.apache.org/jira/browse/SPARK-51622
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.1.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51624) Propagate GetStructField metadata in CreateNamedStruct.dataType

2025-03-26 Thread Andy Lam (Jira)

Andy Lam created SPARK-51624:


 Summary: Propagate GetStructField metadata in 
CreateNamedStruct.dataType
 Key: SPARK-51624
 URL: https://issues.apache.org/jira/browse/SPARK-51624
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.6
Reporter: Andy Lam


This is important because dataType comparisons are important for optimizer 
rules such as SimplifyCasts, which can cascade down to more expression 
optimizations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51573) Fix Streaming State Checkpoint v2 checkpointInfo race condition

2025-03-26 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-51573:


Assignee: Livia Zhu

> Fix Streaming State Checkpoint v2 checkpointInfo race condition
> ---
>
> Key: SPARK-51573
> URL: https://issues.apache.org/jira/browse/SPARK-51573
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Livia Zhu
>Assignee: Livia Zhu
>Priority: Major
>  Labels: pull-request-available
>
> If two tasks are competing for the same RocksDB state store provider, they 
> could run into the following race condition:
>  
> ||task 1||task 2||
> |load() - load version 0| |
> |commit() - committed version 1| |
> | |load() - load version 1|
> | |commit() - committed version 2|
> |getStateStoreCheckpointInfo - get checkpoint info for version 2 :(| |
> We need to ensure that checkpoint info is retrieved atomically with the 
> commit() before the RocksDB instance lock is released.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51621) Support `sparkSession` for `DataFrame`

2025-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51621:
-

Assignee: Dongjoon Hyun

> Support `sparkSession` for `DataFrame`
> --
>
> Key: SPARK-51621
> URL: https://issues.apache.org/jira/browse/SPARK-51621
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: connect-swift-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51621) Support `sparkSession` for `DataFrame`

2025-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51621.
---
Fix Version/s: connect-swift-0.1.0
   Resolution: Fixed

Issue resolved by pull request 28
[https://github.com/apache/spark-connect-swift/pull/28]

> Support `sparkSession` for `DataFrame`
> --
>
> Key: SPARK-51621
> URL: https://issues.apache.org/jira/browse/SPARK-51621
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: connect-swift-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: connect-swift-0.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51623) Remove class files in source releases

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51623:
---
Labels: pull-request-available  (was: )

> Remove class files in source releases
> -
>
> Key: SPARK-51623
> URL: https://issues.apache.org/jira/browse/SPARK-51623
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51573) Fix Streaming State Checkpoint v2 checkpointInfo race condition

2025-03-26 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-51573.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 50344
[https://github.com/apache/spark/pull/50344]

> Fix Streaming State Checkpoint v2 checkpointInfo race condition
> ---
>
> Key: SPARK-51573
> URL: https://issues.apache.org/jira/browse/SPARK-51573
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Livia Zhu
>Assignee: Livia Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> If two tasks are competing for the same RocksDB state store provider, they 
> could run into the following race condition:
>  
> ||task 1||task 2||
> |load() - load version 0| |
> |commit() - committed version 1| |
> | |load() - load version 1|
> | |commit() - committed version 2|
> |getStateStoreCheckpointInfo - get checkpoint info for version 2 :(| |
> We need to ensure that checkpoint info is retrieved atomically with the 
> commit() before the RocksDB instance lock is released.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.

2025-03-26 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-51605.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 50421
[https://github.com/apache/spark/pull/50421]

> If the `logs` directory does not exist, the first launch of `bin/spark-shell 
> --remote local` will fail.
> ---
>
> Key: SPARK-51605
> URL: https://issues.apache.org/jira/browse/SPARK-51605
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Exception in thread "main" java.nio.file.NoSuchFileException: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs
>   at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>   at 
> java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>   at 
> java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
>   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
>   at 
> java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151)
>   at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
>   at 
> java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150)
>   at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885)
>   at java.base/java.nio.file.Path.register(Path.java:894)
>   at 
> org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791)
>   at scala.Option.foreach(Option.scala:437)
>   at 
> org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
>   at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called
> 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51618) Add a check for jars in CI

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-51618.
--
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 50416
[https://github.com/apache/spark/pull/50416]

> Add a check for jars in CI
> --
>
> Key: SPARK-51618
> URL: https://issues.apache.org/jira/browse/SPARK-51618
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> We should disallow jars being added in the source



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51618) Add a check for jars in CI

2025-03-26 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-51618:


Assignee: Hyukjin Kwon

> Add a check for jars in CI
> --
>
> Key: SPARK-51618
> URL: https://issues.apache.org/jira/browse/SPARK-51618
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We should disallow jars being added in the source



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32166) Metastore problem on Spark3.0 with Hive3.0

2025-03-26 Thread jiacai Guo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938784#comment-17938784
 ] 

jiacai Guo commented on SPARK-32166:


i just have this problem,and find 2 solution
 # set spark.sql.legacy.createHiveTableByDefault to false;
 # set hive.metadata.dml.events to false in hivesite.xml or start kyuubi with 
'–hiveconf hive.metadata.dml.events=false;

could you please tell me why it works and is this a bug? [~kevinshin]

>  Metastore problem on Spark3.0 with Hive3.0
> ---
>
> Key: SPARK-32166
> URL: https://issues.apache.org/jira/browse/SPARK-32166
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: hzk
>Priority: Major
>
> When i use spark-sql to create table ,the problem appear.
> {code:java}
> create table bigbig as select b.user_id , b.name , b.age , c.address , c.city 
> , a.position , a.object , a.problem , a.complaint_time from ( select user_id 
> , position , object , problem , complaint_time from 
> HIVE_COMBINE_7efde4e2dcb34c218b3fb08872e698d5 ) as a left join 
> HIVE_ODS_17_TEST_DEMO_ODS_USERS_INFO_20200608141945 as b on b.user_id = 
> a.user_id left join HIVE_ODS_17_TEST_ADDRESS_CITY_20200608141942 as c on 
> c.address_id = b.address_id;
> {code}
> It opened a connection to hive metastore.
> my hive version is 3.1.0.
> {code:java}
> org.apache.thrift.TApplicationException: Required field 'filesAdded' is 
> unset! 
> Struct:InsertEventRequestData(filesAdded:null)org.apache.thrift.TApplicationException:
>  Required field 'filesAdded' is unset! 
> Struct:InsertEventRequestData(filesAdded:null) at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:111) 
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_fire_listener_event(ThriftHiveMetastore.java:4182)
>  at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.fire_listener_event(ThriftHiveMetastore.java:4169)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.fireListenerEvent(HiveMetaStoreClient.java:1954)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
>  at com.sun.proxy.$Proxy5.fireListenerEvent(Unknown Source) at 
> org.apache.hadoop.hive.ql.metadata.Hive.fireInsertEvent(Hive.java:1947) at 
> org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1673) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:847) at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:757)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:757)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:757)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:756)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:829)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:827)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:827)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:827)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.loadTable(SessionCatalog.scala:416)
>  at 
> org.apache.spark.sql.execution.command.LoadDataCommand.run(tables.scala:403) 
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:7

[jira] [Updated] (SPARK-51605) If the `logs` directory does not exist, the first launch of `bin/spark-shell --remote local` will fail.

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51605:
---
Labels: pull-request-available  (was: )

> If the `logs` directory does not exist, the first launch of `bin/spark-shell 
> --remote local` will fail.
> ---
>
> Key: SPARK-51605
> URL: https://issues.apache.org/jira/browse/SPARK-51605
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> bin/spark-shell --remote local
> WARNING: Using incubator modules: jdk.incubator.vector
> Exception in thread "main" java.nio.file.NoSuchFileException: 
> /Users/yangjie01/Tools/spark-4.1.0-SNAPSHOT-bin-3.4.1/logs
>   at 
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
>   at 
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
>   at 
> java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>   at 
> java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:148)
>   at java.base/java.nio.file.Files.readAttributes(Files.java:1851)
>   at 
> java.base/sun.nio.fs.PollingWatchService.doPrivilegedRegister(PollingWatchService.java:173)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:154)
>   at 
> java.base/sun.nio.fs.PollingWatchService$2.run(PollingWatchService.java:151)
>   at 
> java.base/java.security.AccessController.doPrivileged(AccessController.java:569)
>   at 
> java.base/sun.nio.fs.PollingWatchService.register(PollingWatchService.java:150)
>   at java.base/sun.nio.fs.UnixPath.register(UnixPath.java:885)
>   at java.base/java.nio.file.Path.register(Path.java:894)
>   at 
> org.apache.spark.sql.connect.SparkSession$.waitUntilFileExists(SparkSession.scala:717)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13(SparkSession.scala:798)
>   at 
> org.apache.spark.sql.connect.SparkSession$.$anonfun$withLocalConnectServer$13$adapted(SparkSession.scala:791)
>   at scala.Option.foreach(Option.scala:437)
>   at 
> org.apache.spark.sql.connect.SparkSession$.withLocalConnectServer(SparkSession.scala:791)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:67)
>   at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:57)
>   at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 25/03/26 15:39:40 INFO ShutdownHookManager: Shutdown hook called
> 25/03/26 15:39:40 INFO ShutdownHookManager: Deleting directory 
> /private/var/folders/j2/cfn7w6795538n_416_27rkqmgn/T/spark-fe4c9d71-b7d7-437e-b486-514cc538
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51625) command in CTE relations should trigger inline

2025-03-26 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-51625:
---

 Summary: command in CTE relations should trigger inline
 Key: SPARK-51625
 URL: https://issues.apache.org/jira/browse/SPARK-51625
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51624) Propagate GetStructField metadata in CreateNamedStruct.dataType

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51624:
---
Labels: pull-request-available  (was: )

> Propagate GetStructField metadata in CreateNamedStruct.dataType
> ---
>
> Key: SPARK-51624
> URL: https://issues.apache.org/jira/browse/SPARK-51624
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.6
>Reporter: Andy Lam
>Priority: Major
>  Labels: pull-request-available
>
> This is important because dataType comparisons are important for optimizer 
> rules such as SimplifyCasts, which can cascade down to more expression 
> optimizations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51623) Remove class files in source releases

2025-03-26 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-51623:


 Summary: Remove class files in source releases
 Key: SPARK-51623
 URL: https://issues.apache.org/jira/browse/SPARK-51623
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51626) Support `DataFrameReader`

2025-03-26 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-51626:
-

 Summary: Support `DataFrameReader`
 Key: SPARK-51626
 URL: https://issues.apache.org/jira/browse/SPARK-51626
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: connect-swift-0.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51620) Support `columns` for `DataFrame`

2025-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51620.
---
Fix Version/s: connect-swift-0.1.0
   Resolution: Fixed

Issue resolved by pull request 27
[https://github.com/apache/spark-connect-swift/pull/27]

> Support `columns` for `DataFrame`
> -
>
> Key: SPARK-51620
> URL: https://issues.apache.org/jira/browse/SPARK-51620
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: connect-swift-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: connect-swift-0.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51620) Support `columns` for `DataFrame`

2025-03-26 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51620:
-

Assignee: Dongjoon Hyun

> Support `columns` for `DataFrame`
> -
>
> Key: SPARK-51620
> URL: https://issues.apache.org/jira/browse/SPARK-51620
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: connect-swift-0.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51622) Titling sections on ExecutionPage

2025-03-26 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-51622.
--
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 50424
[https://github.com/apache/spark/pull/50424]

> Titling sections on ExecutionPage
> -
>
> Key: SPARK-51622
> URL: https://issues.apache.org/jira/browse/SPARK-51622
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51627) Add a schedule workflow for numpy 2.1.3

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51627:
---
Labels: pull-request-available  (was: )

> Add a schedule workflow for numpy 2.1.3
> ---
>
> Key: SPARK-51627
> URL: https://issues.apache.org/jira/browse/SPARK-51627
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.1
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51628) Clean up the assembly module before maven testing in maven daily test

2025-03-26 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-51628:
-
Summary: Clean up the assembly module before maven testing in maven daily 
test  (was: Clean up the assembly module before maven testing)

> Clean up the assembly module before maven testing in maven daily test
> -
>
> Key: SPARK-51628
> URL: https://issues.apache.org/jira/browse/SPARK-51628
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, Tests
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51628) Clean up the assembly module before maven testing

2025-03-26 Thread Yang Jie (Jira)

Yang Jie created SPARK-51628:


 Summary: Clean up the assembly module before maven testing
 Key: SPARK-51628
 URL: https://issues.apache.org/jira/browse/SPARK-51628
 Project: Spark
  Issue Type: Test
  Components: Project Infra, Tests
Affects Versions: 4.0.0, 4.1.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51629) Add a download link on ExecutionPage for svg/dot/txt format plans

2025-03-26 Thread Kent Yao (Jira)

Kent Yao created SPARK-51629:


 Summary: Add a download link on ExecutionPage for svg/dot/txt 
format plans
 Key: SPARK-51629
 URL: https://issues.apache.org/jira/browse/SPARK-51629
 Project: Spark
  Issue Type: Improvement
  Components: UI
Affects Versions: 4.1.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51622) Titling sections on ExecutionPage

2025-03-26 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-51622:


Assignee: Kent Yao

> Titling sections on ExecutionPage
> -
>
> Key: SPARK-51622
> URL: https://issues.apache.org/jira/browse/SPARK-51622
> Project: Spark
>  Issue Type: Improvement
>  Components: UI
>Affects Versions: 4.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51628) Clean up the assembly module before maven testing in maven daily test

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51628:
---
Labels: pull-request-available  (was: )

> Clean up the assembly module before maven testing in maven daily test
> -
>
> Key: SPARK-51628
> URL: https://issues.apache.org/jira/browse/SPARK-51628
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra, Tests
>Affects Versions: 4.0.0, 4.1.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> Before actually executing `mvn test` in the Maven daily test, a process of 
> `mvn clean -pl assembly` should be added to prevent the issue described in 
> SPARK-51600 from being unverifiable in the Maven daily test. Currently, this 
> action should not be taken when testing the `connect` module, because some 
> tests in the `connect-client-jvm` module strongly depend on the completion of 
> the `assembly` module build.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51621) Support `sparkSession` for `DataFrame`

2025-03-26 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-51621:
-

 Summary: Support `sparkSession` for `DataFrame`
 Key: SPARK-51621
 URL: https://issues.apache.org/jira/browse/SPARK-51621
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: connect-swift-0.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51626) Support `DataFrameReader`

2025-03-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51626:
---
Labels: pull-request-available  (was: )

> Support `DataFrameReader`
> -
>
> Key: SPARK-51626
> URL: https://issues.apache.org/jira/browse/SPARK-51626
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: connect-swift-0.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 108 matches

Mail list logo