date:20250212

[jira] [Updated] (SPARK-51162) SPIP: Add the TIME data type

2025-02-12 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-51162:
-
Description: 
*Q1. What are you trying to do? Articulate your objectives using absolutely no 
jargon.*

Add new data type *TIME* to Spark SQL which represents a time value with fields 
hour, minute, second, up to microseconds. All operations over the type are 
performed without taking any time zone into account. New data type should 
conform to the type *TIME\(n\) WITHOUT TIME ZONE* defined by the SQL standard 
where 0 <= n <= 6.

*Q2. What problem is this proposal NOT designed to solve?*

Don't support the TIME type with time zone defined by the SQL standard: 
{*}TIME\(n\) WITH TIME ZONE{*}.
Also don't support TIME with local timezone.

*Q3. How is it done today, and what are the limits of current practice?*

The TIME type can be emulated via the TIMESTAMP_NTZ data type by setting the 
date part to the some constant value like 1970-01-01, 0001-01-01 or -00-00 
(though this is out of supported range of dates).

Although the type can be emulation via TIMESTAMP_NTZ, Spark SQL cannot 
recognize it in data sources, and for instance cannot load the TIME values from 
parquet files.

*Q4. What is new in your approach and why do you think it will be successful?*

The approach is not new, and we have clear picture how to split the work by 
sub-tasks based on our experience of adding new types ANSI intervals and 
TIMESTAMP_NTZ.

*Q5. Who cares? If you are successful, what difference will it make?*

The new type simplifies migrations to Spark SQL from other DBMS like 
PostgreSQL, Snowflake, Google SQL, Amazon Redshift, Teradata, DB2. Such users 
don't have to rewrite their SQL code to emulate the TIME type. Also new 
functionality impacts on existing Spark SQL users who need to load data w/ the 
TIME values that were stored by other systems.

*Q6. What are the risks?*

Additional handling new type in operators, expression and data sources can 
cause performance regressions. Such risk can be compensated by developing time 
benchmarks in parallel with supporting new type in different places in Spark 
SQL.
 
*Q7. How long will it take?*

In total it might take around {*}9 months{*}. The estimation is based on 
similar tasks: ANSI intervals (SPARK-27790) and TIMESTAMP_NTZ (SPARK-35662). We 
can split the work by function blocks:
 # Base functionality - *3 weeks*
Add new type TimeType, forming/parsing time literals, type constructor, and 
external types.
 # Persistence - *3.5 months*
Ability to create tables of the type TIME, read/write from/to Parquet and other 
built-in data types, partitioning, stats, predicate push down.
 # Time operators - *2 months*
Arithmetic ops, field extract, sorting, and aggregations.
 # Clients support - *1 month*
JDBC, Hive, Thrift server, connect
 # PySpark integration - *1 month*
DataFrame support, pandas API, python UDFs, Arrow column vectors
 # Docs + testing/benchmarking - *1 month*

*Q8. What are the mid-term and final “exams” to check for success?*
The mid-term is in 4 month: basic functionality, read/write new type to 
built-in datasources, basic time operations such as arithmetic ops, casting.
The final "exams" is to support the same functionality as other time types: 
TIMESTAMP_NTZ, DATE, TIMESTAMP.

*Appendix A. Proposed API Changes.*

Add new case class *TimeType* to {_}org.apache.spark.sql.types{_}:
{code:scala}
/**
 * The time type represents a time value with fields hour, minute, second, up 
to microseconds.
 * The range of times supported is 00:00:00.00 to 23:59:59.99.
 *
 * Please use the singleton `DataTypes.TimeType` to refer the type.
 */
class TimeType(precisionField: Byte) private () extends DatetimeType {

  /**
   * The default size of a value of the TimeType is 8 bytes.
   */
  override def defaultSize: Int = 8

  private[spark] override def asNullable: DateType = this
}
{code}
*Appendix B:* As the external types for the new TIME type, we propose:
 - Java/Scala: 
[java.time.LocalTime|https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/time/LocalTime.html]
 - PySpark: [time|https://docs.python.org/3/library/datetime.html#time-objects]

  was:
*Q1. What are you trying to do? Articulate your objectives using absolutely no 
jargon.*

Add new data type *TIME* to Spark SQL which represents a time value with fields 
hour, minute, second, up to microseconds. All operations over the type are 
performed without taking any time zone into account. New data type should 
conform to the type *TIME\(n\) WITHOUT TIME ZONE* defined by the SQL standard 
where 0 <= n <= 6.

*Q2. What problem is this proposal NOT designed to solve?*

Don't support the TIME type with time zone defined by the SQL standard: 
{*}TIME\(n\) WITH TIME ZONE{*}.
Also don't support TIME with local timezone.

*Q3. How is it done today, and what are the limits of current practice?*

The TIME type can

[jira] [Updated] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that centralized check

2025-02-12 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-51179:
---
Summary: Refactor SupportsOrderingWithinGroup so that centralized check  
(was: Refactor SupportsOrderingWithinGroup so that advances the check)

> Refactor SupportsOrderingWithinGroup so that centralized check
> --
>
> Key: SPARK-51179
> URL: https://issues.apache.org/jira/browse/SPARK-51179
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that centralized check

2025-02-12 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-51179:
---
Description: 
Currently, the check in analysis for ListAgg scattered in multiple locations.
We should improve it with centralized check.

  was:
Currently, the check in analysis for ListAgg scattered in multiple locations.
We should 


> Refactor SupportsOrderingWithinGroup so that centralized check
> --
>
> Key: SPARK-51179
> URL: https://issues.apache.org/jira/browse/SPARK-51179
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> Currently, the check in analysis for ListAgg scattered in multiple locations.
> We should improve it with centralized check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that centralized check

2025-02-12 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-51179:
---
Description: 
Currently, the check in analysis for ListAgg scattered in multiple locations.
We should 

> Refactor SupportsOrderingWithinGroup so that centralized check
> --
>
> Key: SPARK-51179
> URL: https://issues.apache.org/jira/browse/SPARK-51179
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> Currently, the check in analysis for ListAgg scattered in multiple locations.
> We should 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that advances the check

2025-02-12 Thread Jiaan Geng (Jira)

Jiaan Geng created SPARK-51179:
--

 Summary: Refactor SupportsOrderingWithinGroup so that advances the 
check
 Key: SPARK-51179
 URL: https://issues.apache.org/jira/browse/SPARK-51179
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.1.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that centralized check

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51179:
---
Labels: pull-request-available  (was: )

> Refactor SupportsOrderingWithinGroup so that centralized check
> --
>
> Key: SPARK-51179
> URL: https://issues.apache.org/jira/browse/SPARK-51179
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, the check in analysis for ListAgg scattered in multiple locations.
> We should improve it with centralized check.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51180) Upgrade `Arrow` to 19.0.0

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51180:
---
Labels: pull-request-available  (was: )

> Upgrade `Arrow` to 19.0.0
> -
>
> Key: SPARK-51180
> URL: https://issues.apache.org/jira/browse/SPARK-51180
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Aimilios Tsouvelekakis
>Priority: Major
>  Labels: pull-request-available
>
> Current v4.0.0 planning has arrow until 18.0.0, it would be good to move it 
> to version 19.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51160) Refactor literal function resolution

2025-02-12 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-51160.
-
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 49887
[https://github.com/apache/spark/pull/49887]

> Refactor literal function resolution
> 
>
> Key: SPARK-51160
> URL: https://issues.apache.org/jira/browse/SPARK-51160
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Timotic
>Assignee: Mihailo Timotic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Refactor literal function resolution to a separate object to enable 
> single-pass analyzer to reuse this logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51160) Refactor literal function resolution

2025-02-12 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-51160:
---

Assignee: Mihailo Timotic

> Refactor literal function resolution
> 
>
> Key: SPARK-51160
> URL: https://issues.apache.org/jira/browse/SPARK-51160
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Timotic
>Assignee: Mihailo Timotic
>Priority: Major
>  Labels: pull-request-available
>
> Refactor literal function resolution to a separate object to enable 
> single-pass analyzer to reuse this logic



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51180) Upgrade `Arrow` to 19.0.0

2025-02-12 Thread Aimilios Tsouvelekakis (Jira)

Aimilios Tsouvelekakis created SPARK-51180:
--

 Summary: Upgrade `Arrow` to 19.0.0
 Key: SPARK-51180
 URL: https://issues.apache.org/jira/browse/SPARK-51180
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Aimilios Tsouvelekakis


Current v4.0.0 planning has arrow until 18.0.0, it would be good to move it to 
version 19.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50812) Support pyspark.ml on Connect

2025-02-12 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-50812:
--
Affects Version/s: (was: 4.1.0)

> Support pyspark.ml on Connect
> -
>
> Key: SPARK-50812
> URL: https://issues.apache.org/jira/browse/SPARK-50812
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Starting from Apache Spark 3.4, Spark has supported Connect which introduced 
> a decoupled client-server architecture that allows remote connectivity to 
> Spark clusters using the DataFrame API and unresolved logical plans as the 
> protocol. The separation between client and server allows Spark and its open 
> ecosystem to be leveraged from everywhere. It can be embedded in modern data 
> applications, in IDEs, Notebooks and programming languages.
> However, Spark Connect currently only supports Spark SQL, which means Spark 
> ML could not run the training/inference via Spark Connect. It will probably 
> result in losing some ML users.
> So I would like to propose a way to support Spark ML on the Connect. Users 
> don't need to change their code to leverage connect to run Spark ML cases.
> Here are some links,
> Design doc: [Support spark.ml on 
> Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing]
>  
> Draft PR: [https://github.com/wbo4958/spark/pull/5]
> Example code,
> {code:python}
> spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
> df = spark.createDataFrame([
> (Vectors.dense([1.0, 2.0]), 1), 
> (Vectors.dense([2.0, -1.0]), 1), 
> (Vectors.dense([-3.0, -2.0]), 0), 
> (Vectors.dense([-1.0, -2.0]), 0), 
> ], schema=['features', 'label'])
> lr = LogisticRegression()
> lr.setMaxIter(30)
> model: LogisticRegressionModel = lr.fit(df)
> z = model.summary
> x = model.predictRaw(Vectors.dense([1.0, 2.0]))
> print(f"predictRaw {x}")
> assert model.getMaxIter() == 30
> model.summary.roc.show()
> print(model.summary.weightedRecall)
> print(model.summary.recallByLabel)
> print(model.coefficients)
> print(model.intercept)
> model.transform(df).show()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51163) Exclude duplicated jars from connect-repl

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51163:
-

Assignee: Cheng Pan

> Exclude duplicated jars from connect-repl
> -
>
> Key: SPARK-51163
> URL: https://issues.apache.org/jira/browse/SPARK-51163
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51163) Exclude duplicated jars from connect-repl

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51163.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 49892
[https://github.com/apache/spark/pull/49892]

> Exclude duplicated jars from connect-repl
> -
>
> Key: SPARK-51163
> URL: https://issues.apache.org/jira/browse/SPARK-51163
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Connect
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51157) Add missing @varargs Scala annotation for scala functon APIs

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51157:
--
Fix Version/s: 3.5.5

> Add missing @varargs Scala annotation for scala functon APIs
> 
>
> Key: SPARK-51157
> URL: https://issues.apache.org/jira/browse/SPARK-51157
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.5
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51181) Enforce determinism when pulling out non deterministic expressions from logical plan

2025-02-12 Thread Mihailo Aleksic (Jira)

Mihailo Aleksic created SPARK-51181:
---

 Summary: Enforce determinism when pulling out non deterministic 
expressions from logical plan
 Key: SPARK-51181
 URL: https://issues.apache.org/jira/browse/SPARK-51181
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.1.0
Reporter: Mihailo Aleksic


Enforce determinism when pulling out non deterministic expressions from logical 
plan to avoid plan normalization problem when comparing single-pass and fixed 
point results.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51163) Exclude duplicated jars from connect-repl

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51163:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Exclude duplicated jars from connect-repl
> -
>
> Key: SPARK-51163
> URL: https://issues.apache.org/jira/browse/SPARK-51163
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Connect
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51008) Implement Result Stage for AQE

2025-02-12 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-51008.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 49715
[https://github.com/apache/spark/pull/49715]

> Implement Result Stage for AQE
> --
>
> Key: SPARK-51008
> URL: https://issues.apache.org/jira/browse/SPARK-51008
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ziqi Liu
>Assignee: Ziqi Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To support 
> [https://github.com/apache/spark/pull/44013#issuecomment-2421167393] we need 
> to implement Result Stage for AQE so that all plan segment can fall into a 
> stage context. This would also improve the AQE flow to a more self-contained 
> state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51008) Implement Result Stage for AQE

2025-02-12 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-51008:
---

Assignee: Ziqi Liu

> Implement Result Stage for AQE
> --
>
> Key: SPARK-51008
> URL: https://issues.apache.org/jira/browse/SPARK-51008
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ziqi Liu
>Assignee: Ziqi Liu
>Priority: Major
>  Labels: pull-request-available
>
> To support 
> [https://github.com/apache/spark/pull/44013#issuecomment-2421167393] we need 
> to implement Result Stage for AQE so that all plan segment can fall into a 
> stage context. This would also improve the AQE flow to a more self-contained 
> state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51113:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Bug)

> Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE 
> IMMEDIATE
> 
>
> Key: SPARK-51113
> URL: https://issues.apache.org/jira/browse/SPARK-51113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Priority: Critical
>  Labels: pull-request-available
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries 
> inside views a keyword is considered an alias:
> ```
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION 
> SELECT 3 UNION SELECT 4")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 
> EXCEPT SELECT 1 EXCEPT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 
> INTERSECT SELECT 2 INTERSECT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> ```
>  !screenshot-1.png! 
>  !screenshot-3.png!
>  !screenshot-4.png!
> Same issue for `EXECUTE IMMEDIATE`:
> ```
> spark.sql("DECLARE v INT")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
> v").queryExecution.analyzed
> spark.sql("SELECT v").show()
> ``` 
> !screenshot-5.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50889) Fix Flaky Test: `SparkSessionE2ESuite.interrupt operation` (Hang)

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-50889:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Fix Flaky Test: `SparkSessionE2ESuite.interrupt operation` (Hang)
> -
>
> Key: SPARK-50889
> URL: https://issues.apache.org/jira/browse/SPARK-50889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> `SparkSessionE2ESuite.interrupt operation` hangs sometimes.
> - `branch-4.0`: 
> https://github.com/apache/spark/actions/runs/12848096505/job/35829436740
> - `branch-4.0`: 
> https://github.com/apache/spark/actions/runs/12951559619/job/36126910293
> {code}
> [info] SparkSessionE2ESuite:
> [info] - interrupt all - background queries, foreground interrupt (217 
> milliseconds)
> [info] - interrupt all - foreground queries, background interrupt (306 
> milliseconds)
> [info] - interrupt all - streaming queries (381 milliseconds)
> [info] - interrupt tag !!! IGNORED !!!
> [info] - interrupt tag - streaming query (776 milliseconds)
> [info] - progress is available for the spark result (2 seconds, 991 
> milliseconds)
> [info] *** Test still running after 5 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 10 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 15 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 20 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 25 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 30 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 35 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 40 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 45 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 50 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 55 minutes, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 1 hour, 59 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 1 hour, 5 minutes, 59 seconds: suite 
> name: SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 1 hour, 10 minutes, 59 seconds: suite 
> name: SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 1 hour, 15 minutes, 59 seconds: suite 
> name: SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 1 hour, 20 minutes, 59 seconds: suite 
> name: SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 1 hour, 25 minutes, 59 seconds: suite 
> name: SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 1 hour, 30 minutes, 59 seconds: suite 
> name: SparkSessionE2ESuite, test name: interrupt operation. 
> {code}
> - `master: 
> https://github.com/apache/spark/actions/runs/12804420645/job/35698812313
> {code}
> [info] SparkSessionE2ESuite:
> [info] - interrupt all - background queries, foreground interrupt (221 
> milliseconds)
> [info] - interrupt all - foreground queries, background interrupt (307 
> milliseconds)
> [info] - interrupt all - streaming queries (394 milliseconds)
> [info] - interrupt tag !!! IGNORED !!!
> [info] - interrupt tag - streaming query (788 milliseconds)
> [info] - progress is available for the spark result (3 seconds, 990 
> milliseconds)
> [info] *** Test still running after 5 minutes, 51 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 10 minutes, 51 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 15 minutes, 51 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 20 minutes, 51 seconds: suite name: 
> SparkSessionE2ESuite, test name: interrupt operation. 
> [info] *** Test still running after 25 minutes, 51 seconds: sui

[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48139:
--
Priority: Critical  (was: Blocker)

> Re-enable `SparkSessionE2ESuite.interrupt tag`
> --
>
> Key: SPARK-48139
> URL: https://issues.apache.org/jira/browse/SPARK-48139
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Dongjoon Hyun
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48139:
--
Target Version/s:   (was: 4.0.0)

> Re-enable `SparkSessionE2ESuite.interrupt tag`
> --
>
> Key: SPARK-48139
> URL: https://issues.apache.org/jira/browse/SPARK-48139
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Dongjoon Hyun
>Priority: Blocker
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-50205) Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in SparkSession are isolated`

2025-02-12 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-50205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926431#comment-17926431
 ] 

Dongjoon Hyun commented on SPARK-50205:
---

I moved this to 4.1.0.

> Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in 
> SparkSession are isolated`
> -
>
> Key: SPARK-50205
> URL: https://issues.apache.org/jira/browse/SPARK-50205
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Pengfei Xu
>Priority: Critical
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/runs/10915451051/job/30295259985
> This test case needs a refactor to use only 2 threads instead of 3, because 
> having 3 threads is not guaranteed in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48139:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Re-enable `SparkSessionE2ESuite.interrupt tag`
> --
>
> Key: SPARK-48139
> URL: https://issues.apache.org/jira/browse/SPARK-48139
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Dongjoon Hyun
>Priority: Critical
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50771) Fix a flaky test: BlockInfoManagerSuite.SPARK-38675 - concurrent unlock and releaseAllLocksForTask calls should not fail

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-50771:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Fix a flaky test: BlockInfoManagerSuite.SPARK-38675 - concurrent unlock and 
> releaseAllLocksForTask calls should not fail
> 
>
> Key: SPARK-50771
> URL: https://issues.apache.org/jira/browse/SPARK-50771
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> https://github.com/apache/spark/actions/runs/12666965730/job/35299446885
> {code}
> [info] - SPARK-38675 - concurrent unlock and releaseAllLocksForTask calls 
> should not fail *** FAILED *** (2 milliseconds)
> [info]   java.lang.AssertionError: assertion failed
> [info]   at scala.Predef$.assert(Predef.scala:264)
> [info]   at 
> org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:89)
> [info]   at 
> org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:71)
> [info]   at 
> org.apache.spark.storage.BlockInfoManager.$anonfun$releaseAllLocksForTask$6(BlockInfoManager.scala:498)
> [info]   at 
> org.apache.spark.storage.BlockInfoManager.$anonfun$releaseAllLocksForTask$6$adapted(BlockInfoManager.scala:497)
> [info]   at 
> org.apache.spark.storage.BlockInfoWrapper.withLock(BlockInfoManager.scala:105)
> [info]   at 
> org.apache.spark.storage.BlockInfoManager.blockInfo(BlockInfoManager.scala:271)
> [info]   at 
> org.apache.spark.storage.BlockInfoManager.$anonfun$releaseAllLocksForTask$5(BlockInfoManager.scala:497)
> [info]   at java.base/java.lang.Iterable.forEach(Iterable.java:75)
> [info]   at 
> org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:493)
> [info]   at 
> org.apache.spark.storage.BlockInfoManagerSuite.$anonfun$new$82(BlockInfoManagerSuite.scala:399)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> [info]   at 
> org.apache.spark.storage.BlockInfoManagerSuite.withTaskId(BlockInfoManagerSuite.scala:66)
> [info]   at 
> org.apache.spark.storage.BlockInfoManagerSuite.$anonfun$new$81(BlockInfoManagerSuite.scala:385)
> [info]   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
> [info]   at 
> org.apache.spark.storage.BlockInfoManagerSuite.$anonfun$new$80(BlockInfoManagerSuite.scala:384)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> [info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> [info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> [info]   at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> [info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234)
> [info]   at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227)
> [info]   at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:334)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.Su

[jira] [Updated] (SPARK-50205) Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in SparkSession are isolated`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-50205:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in 
> SparkSession are isolated`
> -
>
> Key: SPARK-50205
> URL: https://issues.apache.org/jira/browse/SPARK-50205
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Pengfei Xu
>Priority: Critical
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/runs/10915451051/job/30295259985
> This test case needs a refactor to use only 2 threads instead of 3, because 
> having 3 threads is not guaranteed in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51019) Fix Flaky Test: `SPARK-47148: AQE should avoid to submit shuffle job on cancellation`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51019:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Fix Flaky Test: `SPARK-47148: AQE should avoid to submit shuffle job on 
> cancellation`
> -
>
> Key: SPARK-51019
> URL: https://issues.apache.org/jira/browse/SPARK-51019
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - https://github.com/apache/spark/actions/runs/13004225714/job/36268222928
> {code}
> == Parsed Logical Plan ==
> 'Join UsingJoin(Inner, [id])
> :- Project [id#133801L, scalarsubquery()#133805]
> :  +- Join Inner, (id#133801L = id#133806L)
> : :- Project [id#133801L, scalar-subquery#133800 [] AS 
> scalarsubquery()#133805]
> : :  :  +- Project [slow_udf() AS slow_udf()#133804]
> : :  : +- Range (0, 2, step=1)
> : :  +- Range (0, 5, step=1)
> : +- Repartition 2, false
> :+- Project [id#133806L]
> :   +- Range (0, 10, step=1)
> +- Project [id#133808L, scalar-subquery#133807 [] AS scalarsubquery()#133812]
>:  +- Project [slow_udf() AS slow_udf()#133811]
>: +- Range (0, 2, step=1)
>+- Filter (id#133808L > cast(2 as bigint))
>   +- Range (0, 15, step=1)
> == Analyzed Logical Plan ==
> id: bigint, scalarsubquery(): int, scalarsubquery(): int
> Project [id#133801L, scalarsubquery()#133805, scalarsubquery()#133812]
> +- Join Inner, (id#133801L = id#133808L)
>:- Project [id#133801L, scalarsubquery()#133805]
>:  +- Join Inner, (id#133801L = id#133806L)
>: :- Project [id#133801L, scalar-subquery#133800 [] AS 
> scalarsubquery()#133805]
>: :  :  +- Project [slow_udf() AS slow_udf()#133804]
>: :  : +- Range (0, 2, step=1)
>: :  +- Range (0, 5, step=1)
>: +- Repartition 2, false
>:+- Project [id#133806L]
>:   +- Range (0, 10, step=1)
>+- Project [id#133808L, scalar-subquery#133807 [] AS 
> scalarsubquery()#133812]
>   :  +- Project [slow_udf() AS slow_udf()#133811]
>   : +- Range (0, 2, step=1)
>   +- Filter (id#133808L > cast(2 as bigint))
>  +- Range (0, 15, step=1)
> == Optimized Logical Plan ==
> Project [id#133801L, scalarsubquery()#133805, scalarsubquery()#133812]
> +- Join Inner, (id#133801L = id#133808L)
>:- Project [id#133801L, scalarsubquery()#133805]
>:  +- Join Inner, (id#133801L = id#133806L)
>: :- Project [id#133801L, scalar-subquery#133800 [] AS 
> scalarsubquery()#133805]
>: :  :  +- Project [slow_udf() AS slow_udf()#133804]
>: :  : +- Range (0, 2, step=1)
>: :  +- Filter (id#133801L > 2)
>: : +- Range (0, 5, step=1)
>: +- Repartition 2, false
>:+- Range (0, 10, step=1)
>+- Project [id#133808L, scalar-subquery#133807 [] AS 
> scalarsubquery()#133812]
>   :  +- Project [slow_udf() AS slow_udf()#133804]
>   : +- Range (0, 2, step=1)
>   +- Filter (id#133808L > 2)
>  +- Range (0, 15, step=1)
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Project [id#133801L, scalarsubquery()#133805, scalarsubquery()#133812]
>+- SortMergeJoin [id#133801L], [id#133808L], Inner
>   :- Project [id#133801L, scalarsubquery()#133805]
>   :  +- SortMergeJoin [id#133801L], [id#133806L], Inner
>   : :- Sort [id#133801L ASC NULLS FIRST], false, 0
>   : :  +- Exchange hashpartitioning(id#133801L, 5), 
> ENSURE_REQUIREMENTS, [plan_id=423273]
>   : : +- Project [id#133801L, Subquery subquery#133800, 
> [id=#423258] AS scalarsubquery()#133805]
>   : ::  +- Subquery subquery#133800, [id=#423258]
>   : :: +- AdaptiveSparkPlan isFinalPlan=false
>   : ::+- Project [slow_udf() AS slow_udf()#133804]
>   : ::   +- Range (0, 2, step=1, splits=2)
>   : :+- Filter (id#133801L > 2)
>   : :   +- Range (0, 5, step=1, splits=2)
>   : +- Sort [id#133806L ASC NULLS FIRST], false, 0
>   :+- Exchange hashpartitioning(id#133806L, 5), 
> ENSURE_REQUIREMENTS, [plan_id=423272]
>   :   +- TestProblematicCoalesce 2
>   :  +- Range (0, 10, step=1, splits=2)
>   +- Sort [id#133808L ASC NULLS FIRST], false, 0
>  +- Exchange hashpartitioning(id#133808L, 5), ENSURE_REQUIREMENTS, 
> [plan_id=423284]
> +- Project [id#133808L, Subquery subquery#133807, [id=#423262] AS 
> scalarsubquery()#133812]
>:  +- Subquery subquery#133807, [id=#423262]
>: +- AdaptiveSparkPlan isFinalPlan=false
>:+-

[jira] [Updated] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51113:
--
Priority: Blocker  (was: Critical)

> Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE 
> IMMEDIATE
> 
>
> Key: SPARK-51113
> URL: https://issues.apache.org/jira/browse/SPARK-51113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries 
> inside views a keyword is considered an alias:
> ```
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION 
> SELECT 3 UNION SELECT 4")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 
> EXCEPT SELECT 1 EXCEPT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 
> INTERSECT SELECT 2 INTERSECT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> ```
>  !screenshot-1.png! 
>  !screenshot-3.png!
>  !screenshot-4.png!
> Same issue for `EXECUTE IMMEDIATE`:
> ```
> spark.sql("DECLARE v INT")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
> v").queryExecution.analyzed
> spark.sql("SELECT v").show()
> ``` 
> !screenshot-5.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51113:
--
Target Version/s: 4.0.0

> Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE 
> IMMEDIATE
> 
>
> Key: SPARK-51113
> URL: https://issues.apache.org/jira/browse/SPARK-51113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries 
> inside views a keyword is considered an alias:
> ```
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION 
> SELECT 3 UNION SELECT 4")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 
> EXCEPT SELECT 1 EXCEPT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 
> INTERSECT SELECT 2 INTERSECT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> ```
>  !screenshot-1.png! 
>  !screenshot-3.png!
>  !screenshot-4.png!
> Same issue for `EXECUTE IMMEDIATE`:
> ```
> spark.sql("DECLARE v INT")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
> v").queryExecution.analyzed
> spark.sql("SELECT v").show()
> ``` 
> !screenshot-5.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51046) `SubExprEliminationBenchmark` fails at `CodeGenerator`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51046:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> `SubExprEliminationBenchmark` fails at `CodeGenerator`
> --
>
> Key: SPARK-51046
> URL: https://issues.apache.org/jira/browse/SPARK-51046
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> Running org.apache.spark.sql.execution.SubExprEliminationBenchmark:
> ...
> reparing data for benchmarking ...
> Running benchmark: from_json as subExpr in Filter
>   Running case: subExprElimination false, codegen: true
> 25/01/30 22:24:08 ERROR CodeGenerator: Failed to compile the generated Java 
> code.
> org.codehaus.commons.compiler.InternalCompilerException: Compiling 
> "GeneratedClass" in File 'generated.java', Line 1, Column 1: File 
> 'generated.java', Line 24, Column 16: Compiling "processNext()"
> ...
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50205) Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in SparkSession are isolated`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-50205:
--
Target Version/s:   (was: 4.0.0)

> Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in 
> SparkSession are isolated`
> -
>
> Key: SPARK-50205
> URL: https://issues.apache.org/jira/browse/SPARK-50205
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0, 3.5.2
>Reporter: Pengfei Xu
>Priority: Critical
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/runs/10915451051/job/30295259985
> This test case needs a refactor to use only 2 threads instead of 3, because 
> having 3 threads is not guaranteed in CI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-49586) Add addArtifact API to PySpark

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-49586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-49586:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Add addArtifact API to PySpark
> --
>
> Key: SPARK-49586
> URL: https://issues.apache.org/jira/browse/SPARK-49586
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Pengfei Xu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50888) Fix Flaky Test: `SparkConnectServiceSuite.SPARK-44776: LocalTableScanExe`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-50888:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Fix Flaky Test: `SparkConnectServiceSuite.SPARK-44776: LocalTableScanExe`
> -
>
> Key: SPARK-50888
> URL: https://issues.apache.org/jira/browse/SPARK-50888
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - `branch-4.0`: 
> https://github.com/apache/spark/actions/runs/12879810930/job/35907876872
> - `branch-4.0`: 
> https://github.com/apache/spark/actions/runs/12848096505/job/35825096617
> {code}
> [info] SparkConnectServiceSuite:
> [info] - Test schema in analyze response (92 milliseconds)
> [info] - SPARK-41224: collect data using arrow (101 milliseconds)
> [info] - SPARK-44776: LocalTableScanExec *** FAILED *** (34 milliseconds)
> [info]   VerifyEvents.this.executeHolder.eventsManager.hasError.isDefined was 
> false (SparkConnectServiceSuite.scala:895)
> [info]   org.scalatest.exceptions.TestFailedException:
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
> [info]   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
> [info]   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
> [info]   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
> [info]   at 
> org.apache.spark.sql.connect.planner.SparkConnectServiceSuite$VerifyEvents.onError(SparkConnectServiceSuite.scala:895)
> [info]   at 
> org.apache.spark.sql.connect.planner.SparkConnectServiceSuite$$anon$2.onError(SparkConnectServiceSuite.scala:292)
> [info]   at 
> org.apache.spark.sql.connect.utils.ErrorUtils$$anonfun$handleError$1.applyOrElse(ErrorUtils.scala:329)
> [info]   at 
> org.apache.spark.sql.connect.utils.ErrorUtils$$anonfun$handleError$1.applyOrElse(ErrorUtils.scala:304)
> [info]   at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
> [info]   at scala.PartialFunction$Combined.apply(PartialFunction.scala:301)
> [info]   at 
> org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:75)
> [info]   at 
> org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.$anonfun$new$14(SparkConnectServiceSuite.scala:285)
> [info]   at 
> org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.$anonfun$new$14$adapted(SparkConnectServiceSuite.scala:249)
> [info]   at 
> org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.$anonfun$withEvents$1(SparkConnectServiceSuite.scala:853)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> [info]   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80)
> [info]   at 
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77)
> [info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99)
> [info]   at 
> org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.withEvents(SparkConnectServiceSuite.scala:856)
> [info]   at 
> org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.$anonfun$new$13(SparkConnectServiceSuite.scala:249)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> [info]   at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127)
> [info]   at 
> org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231)
> [info]   at 
> org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230)
> [info]   at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69)
> [info]   at 
> org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFun

[jira] [Updated] (SPARK-50748) Fix a flaky test: `SparkSessionE2ESuite.interrupt all - background queries, foreground interrupt`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-50748:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Fix a flaky test: `SparkSessionE2ESuite.interrupt all - background queries, 
> foreground interrupt`
> -
>
> Key: SPARK-50748
> URL: https://issues.apache.org/jira/browse/SPARK-50748
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> https://github.com/apache/spark/actions/runs/12627485924/job/35182190161 
> (2025-01-06)
> {code}
> [info] SparkSessionE2ESuite:
> [info] - interrupt all - background queries, foreground interrupt *** FAILED 
> *** (20 seconds, 63 milliseconds)
> [info]   The code passed to eventually never returned normally. Attempted 30 
> times over 20.057432362 seconds. Last failure message: q1Interrupted was 
> false. (SparkSessionE2ESuite.scala:71)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51180) Upgrade `Arrow` to 19.0.0

2025-02-12 Thread Aimilios Tsouvelekakis (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aimilios Tsouvelekakis updated SPARK-51180:
---
Affects Version/s: 4.1.0
   (was: 4.0.0)

> Upgrade `Arrow` to 19.0.0
> -
>
> Key: SPARK-51180
> URL: https://issues.apache.org/jira/browse/SPARK-51180
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.1.0
>Reporter: Aimilios Tsouvelekakis
>Priority: Major
>  Labels: pull-request-available
>
> Current v4.0.0 planning has arrow until 18.0.0, it would be good to move it 
> to version 19.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE

2025-02-12 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-51113:
---

Assignee: Vladimir Golubev

> Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE 
> IMMEDIATE
> 
>
> Key: SPARK-51113
> URL: https://issues.apache.org/jira/browse/SPARK-51113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Blocker
>  Labels: pull-request-available
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries 
> inside views a keyword is considered an alias:
> ```
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION 
> SELECT 3 UNION SELECT 4")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 
> EXCEPT SELECT 1 EXCEPT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 
> INTERSECT SELECT 2 INTERSECT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> ```
>  !screenshot-1.png! 
>  !screenshot-3.png!
>  !screenshot-4.png!
> Same issue for `EXECUTE IMMEDIATE`:
> ```
> spark.sql("DECLARE v INT")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
> v").queryExecution.analyzed
> spark.sql("SELECT v").show()
> ``` 
> !screenshot-5.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE

2025-02-12 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-51113.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 49835
[https://github.com/apache/spark/pull/49835]

> Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE 
> IMMEDIATE
> 
>
> Key: SPARK-51113
> URL: https://issues.apache.org/jira/browse/SPARK-51113
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>
> There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries 
> inside views a keyword is considered an alias:
> ```
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION 
> SELECT 3 UNION SELECT 4")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 
> EXCEPT SELECT 1 EXCEPT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 
> INTERSECT SELECT 2 INTERSECT SELECT 2")
> spark.sql("SELECT * FROM v1").show()
> spark.sql("SELECT * FROM v1").queryExecution.analyzed
> ```
>  !screenshot-1.png! 
>  !screenshot-3.png!
>  !screenshot-4.png!
> Same issue for `EXECUTE IMMEDIATE`:
> ```
> spark.sql("DECLARE v INT")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v")
> spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO 
> v").queryExecution.analyzed
> spark.sql("SELECT v").show()
> ``` 
> !screenshot-5.png! 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51187) Implement the graceful deprecation of incorrect config introduced in SPARK-49699

2025-02-12 Thread Jungtaek Lim (Jira)

Jungtaek Lim created SPARK-51187:


 Summary: Implement the graceful deprecation of incorrect config 
introduced in SPARK-49699
 Key: SPARK-51187
 URL: https://issues.apache.org/jira/browse/SPARK-51187
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 3.5.4, 4.0.0
Reporter: Jungtaek Lim


See the comments in this PR [https://github.com/apache/spark/pull/49905] to 
find rationale.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51191) Validate default values handling in DELETE, UPDATE, MERGE

2025-02-12 Thread Anton Okolnychyi (Jira)

Anton Okolnychyi created SPARK-51191:


 Summary: Validate default values handling in DELETE, UPDATE, MERGE
 Key: SPARK-51191
 URL: https://issues.apache.org/jira/browse/SPARK-51191
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 4.1
Reporter: Anton Okolnychyi






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51190) Fix TreeEnsembleModel.treeWeights

2025-02-12 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-51190:
-

 Summary: Fix TreeEnsembleModel.treeWeights
 Key: SPARK-51190
 URL: https://issues.apache.org/jira/browse/SPARK-51190
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, ML
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51190) Fix TreeEnsembleModel.treeWeights

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51190:
-

Assignee: Ruifeng Zheng

> Fix TreeEnsembleModel.treeWeights
> -
>
> Key: SPARK-51190
> URL: https://issues.apache.org/jira/browse/SPARK-51190
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-50812) Support pyspark.ml on Connect

2025-02-12 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926653#comment-17926653
 ] 

Dongjoon Hyun commented on SPARK-50812:
---

I added the label, `releasenotes`, not to forget to mention this effort.

> Support pyspark.ml on Connect
> -
>
> Key: SPARK-50812
> URL: https://issues.apache.org/jira/browse/SPARK-50812
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Bobby Wang
>Priority: Major
>  Labels: releasenotes
> Fix For: 4.0.0
>
>
> Starting from Apache Spark 3.4, Spark has supported Connect which introduced 
> a decoupled client-server architecture that allows remote connectivity to 
> Spark clusters using the DataFrame API and unresolved logical plans as the 
> protocol. The separation between client and server allows Spark and its open 
> ecosystem to be leveraged from everywhere. It can be embedded in modern data 
> applications, in IDEs, Notebooks and programming languages.
> However, Spark Connect currently only supports Spark SQL, which means Spark 
> ML could not run the training/inference via Spark Connect. It will probably 
> result in losing some ML users.
> So I would like to propose a way to support Spark ML on the Connect. Users 
> don't need to change their code to leverage connect to run Spark ML cases.
> Here are some links,
> Design doc: [Support spark.ml on 
> Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing]
>  
> Draft PR: [https://github.com/wbo4958/spark/pull/5]
> Example code,
> {code:python}
> spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
> df = spark.createDataFrame([
> (Vectors.dense([1.0, 2.0]), 1), 
> (Vectors.dense([2.0, -1.0]), 1), 
> (Vectors.dense([-3.0, -2.0]), 0), 
> (Vectors.dense([-1.0, -2.0]), 0), 
> ], schema=['features', 'label'])
> lr = LogisticRegression()
> lr.setMaxIter(30)
> model: LogisticRegressionModel = lr.fit(df)
> z = model.summary
> x = model.predictRaw(Vectors.dense([1.0, 2.0]))
> print(f"predictRaw {x}")
> assert model.getMaxIter() == 30
> model.summary.roc.show()
> print(model.summary.weightedRecall)
> print(model.summary.recallByLabel)
> print(model.coefficients)
> print(model.intercept)
> model.transform(df).show()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50812) Support pyspark.ml on Connect

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-50812:
--
Labels: releasenotes  (was: pull-request-available)

> Support pyspark.ml on Connect
> -
>
> Key: SPARK-50812
> URL: https://issues.apache.org/jira/browse/SPARK-50812
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Bobby Wang
>Priority: Major
>  Labels: releasenotes
> Fix For: 4.0.0
>
>
> Starting from Apache Spark 3.4, Spark has supported Connect which introduced 
> a decoupled client-server architecture that allows remote connectivity to 
> Spark clusters using the DataFrame API and unresolved logical plans as the 
> protocol. The separation between client and server allows Spark and its open 
> ecosystem to be leveraged from everywhere. It can be embedded in modern data 
> applications, in IDEs, Notebooks and programming languages.
> However, Spark Connect currently only supports Spark SQL, which means Spark 
> ML could not run the training/inference via Spark Connect. It will probably 
> result in losing some ML users.
> So I would like to propose a way to support Spark ML on the Connect. Users 
> don't need to change their code to leverage connect to run Spark ML cases.
> Here are some links,
> Design doc: [Support spark.ml on 
> Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing]
>  
> Draft PR: [https://github.com/wbo4958/spark/pull/5]
> Example code,
> {code:python}
> spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
> df = spark.createDataFrame([
> (Vectors.dense([1.0, 2.0]), 1), 
> (Vectors.dense([2.0, -1.0]), 1), 
> (Vectors.dense([-3.0, -2.0]), 0), 
> (Vectors.dense([-1.0, -2.0]), 0), 
> ], schema=['features', 'label'])
> lr = LogisticRegression()
> lr.setMaxIter(30)
> model: LogisticRegressionModel = lr.fit(df)
> z = model.summary
> x = model.predictRaw(Vectors.dense([1.0, 2.0]))
> print(f"predictRaw {x}")
> assert model.getMaxIter() == 30
> model.summary.roc.show()
> print(model.summary.weightedRecall)
> print(model.summary.recallByLabel)
> print(model.coefficients)
> print(model.intercept)
> model.transform(df).show()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51176) Meet consistency for unexpected errors PySpark Connect <> Classic

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51176:
---
Labels: pull-request-available  (was: )

> Meet consistency for unexpected errors PySpark Connect <> Classic
> -
>
> Key: SPARK-51176
> URL: https://issues.apache.org/jira/browse/SPARK-51176
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51197) unit test clean up

2025-02-12 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-51197:
-

 Summary: unit test clean up
 Key: SPARK-51197
 URL: https://issues.apache.org/jira/browse/SPARK-51197
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51197) unit test clean up

2025-02-12 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-51197.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 49927
[https://github.com/apache/spark/pull/49927]

> unit test clean up
> --
>
> Key: SPARK-51197
> URL: https://issues.apache.org/jira/browse/SPARK-51197
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51197) unit test clean up

2025-02-12 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-51197:
-

Assignee: Ruifeng Zheng

> unit test clean up
> --
>
> Key: SPARK-51197
> URL: https://issues.apache.org/jira/browse/SPARK-51197
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified

2025-02-12 Thread Vlad Rozov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-51182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926679#comment-17926679
 ] 

Vlad Rozov commented on SPARK-51182:


This issue is a follow up on [https://github.com/apache/spark/pull/49654] and I 
have necessary changes implemented already and will open PR shortly.

> DataFrameWriter should throw dataPathNotSpecifiedError when path is not 
> specified
> -
>
> Key: SPARK-51182
> URL: https://issues.apache.org/jira/browse/SPARK-51182
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Vlad Rozov
>Priority: Minor
>  Labels: pull-request-available
>
> When {{path}} is not specified in the call to 
> {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, 
> {{parquet(path)}} and etc, it will be more accurate to raise 
> {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51189) Promote JobFailed to DeveloperApi

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51189:
---
Labels: pull-request-available  (was: )

> Promote JobFailed to DeveloperApi
> -
>
> Key: SPARK-51189
> URL: https://issues.apache.org/jira/browse/SPARK-51189
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51059) Document how ALLOWED_ATTRIBUTES works

2025-02-12 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-51059.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 49918
[https://github.com/apache/spark/pull/49918]

> Document how ALLOWED_ATTRIBUTES works
> -
>
> Key: SPARK-51059
> URL: https://issues.apache.org/jira/browse/SPARK-51059
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-50812) Support pyspark.ml on Connect

2025-02-12 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-50812.
---
Resolution: Resolved

> Support pyspark.ml on Connect
> -
>
> Key: SPARK-50812
> URL: https://issues.apache.org/jira/browse/SPARK-50812
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Bobby Wang
>Priority: Major
>  Labels: releasenotes
> Fix For: 4.0.0
>
>
> Starting from Apache Spark 3.4, Spark has supported Connect which introduced 
> a decoupled client-server architecture that allows remote connectivity to 
> Spark clusters using the DataFrame API and unresolved logical plans as the 
> protocol. The separation between client and server allows Spark and its open 
> ecosystem to be leveraged from everywhere. It can be embedded in modern data 
> applications, in IDEs, Notebooks and programming languages.
> However, Spark Connect currently only supports Spark SQL, which means Spark 
> ML could not run the training/inference via Spark Connect. It will probably 
> result in losing some ML users.
> So I would like to propose a way to support Spark ML on the Connect. Users 
> don't need to change their code to leverage connect to run Spark ML cases.
> Here are some links,
> Design doc: [Support spark.ml on 
> Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing]
>  
> Draft PR: [https://github.com/wbo4958/spark/pull/5]
> Example code,
> {code:python}
> spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
> df = spark.createDataFrame([
> (Vectors.dense([1.0, 2.0]), 1), 
> (Vectors.dense([2.0, -1.0]), 1), 
> (Vectors.dense([-3.0, -2.0]), 0), 
> (Vectors.dense([-1.0, -2.0]), 0), 
> ], schema=['features', 'label'])
> lr = LogisticRegression()
> lr.setMaxIter(30)
> model: LogisticRegressionModel = lr.fit(df)
> z = model.summary
> x = model.predictRaw(Vectors.dense([1.0, 2.0]))
> print(f"predictRaw {x}")
> assert model.getMaxIter() == 30
> model.summary.roc.show()
> print(model.summary.weightedRecall)
> print(model.summary.recallByLabel)
> print(model.coefficients)
> print(model.intercept)
> model.transform(df).show()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51189) Promote JobFailed to DeveloperApi

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51189:
-

Assignee: Cheng Pan

> Promote JobFailed to DeveloperApi
> -
>
> Key: SPARK-51189
> URL: https://issues.apache.org/jira/browse/SPARK-51189
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51189) Promote JobFailed to DeveloperApi

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51189.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 49920
[https://github.com/apache/spark/pull/49920]

> Promote JobFailed to DeveloperApi
> -
>
> Key: SPARK-51189
> URL: https://issues.apache.org/jira/browse/SPARK-51189
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51193) Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51193.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 49923
[https://github.com/apache/spark/pull/49923]

> Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final
> -
>
> Key: SPARK-51193
> URL: https://issues.apache.org/jira/browse/SPARK-51193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51197) unit test clean up

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51197:
---
Labels: pull-request-available  (was: )

> unit test clean up
> --
>
> Key: SPARK-51197
> URL: https://issues.apache.org/jira/browse/SPARK-51197
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified

2025-02-12 Thread Wei Guo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-51182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926681#comment-17926681
 ] 

Wei Guo commented on SPARK-51182:
-

Okay, just follow up and I will close my PR [~vrozov] 

> DataFrameWriter should throw dataPathNotSpecifiedError when path is not 
> specified
> -
>
> Key: SPARK-51182
> URL: https://issues.apache.org/jira/browse/SPARK-51182
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Vlad Rozov
>Priority: Minor
>  Labels: pull-request-available
>
> When {{path}} is not specified in the call to 
> {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, 
> {{parquet(path)}} and etc, it will be more accurate to raise 
> {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51198) Revise `defaultMinPartitions` function description

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51198:
---
Labels: pull-request-available  (was: )

> Revise `defaultMinPartitions` function description
> --
>
> Key: SPARK-51198
> URL: https://issues.apache.org/jira/browse/SPARK-51198
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.1.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51198) Revise `defaultMinPartitions` function description

2025-02-12 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-51198:
-

 Summary: Revise `defaultMinPartitions` function description
 Key: SPARK-51198
 URL: https://issues.apache.org/jira/browse/SPARK-51198
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51199) Valid CSV records considered malformed

2025-02-12 Thread Andreas Franz (Jira)

Andreas Franz created SPARK-51199:
-

 Summary: Valid CSV records considered malformed
 Key: SPARK-51199
 URL: https://issues.apache.org/jira/browse/SPARK-51199
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.5.4
 Environment: SparkContext: Running Spark version 3.5.4
SparkContext: OS info Mac OS X, 15.3, aarch64
SparkContext: Java version 17.0.14 2025-01-21 LTS
OpenJDK Runtime Environment Corretto-17.0.14.7.1 (build 17.0.14+7-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (build 17.0.14+7-LTS, mixed mode, 
sharing)
Reporter: Andreas Franz


There is an issue parsing CSV files with a combination of escaped double quotes 
and commas in a field.

I've created a small example that demonstrates the issue:
{code:java}
package com.example

import org.apache.spark.sql.SparkSession

object Example {

def main(args: Array[String]): Unit = {

val spark = SparkSession.builder()
.appName("CSV Example")
.master("local[*]")
.config("spark.driver.host", "localhost")
.config("spark.ui.enabled", "false")
.getOrCreate()

val csv = spark
.read
.option("header", "true")
.option("mode", "FAILFAST")
.csv("./src/main/scala/com/example/example.csv")

csv.show(2, truncate = false)

spark.stop()
}
} {code}
{code:java}
id,region_name,gp_id,gp_name,gp_group_id,gp_group_name,gp_group_region_name 
111234567,east,1122723,"Test 1",,, 001234567,east,1122723,"Foo ""Bar"", New 
York, US",,,
{code}
According to [https://www.ietf.org/rfc/rfc4180.txt|http://example.com/] this is 
a valid CSV record.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51188) Upgrade Arrow to 18.2.0

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51188:
---
Labels: pull-request-available  (was: )

> Upgrade Arrow to 18.2.0
> ---
>
> Key: SPARK-51188
> URL: https://issues.apache.org/jira/browse/SPARK-51188
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.1.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51188) Upgrade Arrow to 18.2.0

2025-02-12 Thread Yang Jie (Jira)

Yang Jie created SPARK-51188:


 Summary: Upgrade Arrow to 18.2.0
 Key: SPARK-51188
 URL: https://issues.apache.org/jira/browse/SPARK-51188
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.1.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51059) Document how ALLOWED_ATTRIBUTES works

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51059:
---
Labels: pull-request-available  (was: )

> Document how ALLOWED_ATTRIBUTES works
> -
>
> Key: SPARK-51059
> URL: https://issues.apache.org/jira/browse/SPARK-51059
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51189) Promote JobFailed to DeveloperApi

2025-02-12 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-51189:
-

 Summary: Promote JobFailed to DeveloperApi
 Key: SPARK-51189
 URL: https://issues.apache.org/jira/browse/SPARK-51189
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51189) Promote JobFailed to DeveloperApi

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51189:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Promote JobFailed to DeveloperApi
> -
>
> Key: SPARK-51189
> URL: https://issues.apache.org/jira/browse/SPARK-51189
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51195) Upgrade `kubernetes-client` to 7.1.0

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51195.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 49925
[https://github.com/apache/spark/pull/49925]

> Upgrade `kubernetes-client` to 7.1.0
> 
>
> Key: SPARK-51195
> URL: https://issues.apache.org/jira/browse/SPARK-51195
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, k8s
>Affects Versions: 4.1.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51201) Make Partitioning Hints support byte and short values

2025-02-12 Thread Kent Yao (Jira)

Kent Yao created SPARK-51201:


 Summary: Make Partitioning Hints support byte and short values
 Key: SPARK-51201
 URL: https://issues.apache.org/jira/browse/SPARK-51201
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51133) Upgrade `commons-pool2` to 2.12.1

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51133:
--
Summary: Upgrade `commons-pool2` to 2.12.1  (was: Upgrade Apache 
`commons-pool2` to 2.12.1)

> Upgrade `commons-pool2` to 2.12.1
> -
>
> Key: SPARK-51133
> URL: https://issues.apache.org/jira/browse/SPARK-51133
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.1.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51194) Upgrade `scalafmt` to 3.8.6

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51194.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 49924
[https://github.com/apache/spark/pull/49924]

> Upgrade `scalafmt` to 3.8.6
> ---
>
> Key: SPARK-51194
> URL: https://issues.apache.org/jira/browse/SPARK-51194
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.1.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51200) Add SparkR deprecation info to `README.md` and `make-distribution.sh` help

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51200:
-

Assignee: Dongjoon Hyun

> Add SparkR deprecation info to `README.md` and `make-distribution.sh` help
> --
>
> Key: SPARK-51200
> URL: https://issues.apache.org/jira/browse/SPARK-51200
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51193) Upgrade Netty to 4.1.118.Final

2025-02-12 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-51193:
-

 Summary: Upgrade Netty to 4.1.118.Final
 Key: SPARK-51193
 URL: https://issues.apache.org/jira/browse/SPARK-51193
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51193) Upgrade Netty to 4.1.118.Final

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51193:
---
Labels: pull-request-available  (was: )

> Upgrade Netty to 4.1.118.Final
> --
>
> Key: SPARK-51193
> URL: https://issues.apache.org/jira/browse/SPARK-51193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51193) Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51193:
--
Summary: Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final  
(was: Upgrade Netty to 4.1.118.Final)

> Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final
> -
>
> Key: SPARK-51193
> URL: https://issues.apache.org/jira/browse/SPARK-51193
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51192) Expose a ResponseObserver-free version of `process` in SparkConnectPlanner

2025-02-12 Thread Venkata Sai Akhil Gudesa (Jira)

Venkata Sai Akhil Gudesa created SPARK-51192:


 Summary: Expose a ResponseObserver-free version of `process` in 
SparkConnectPlanner
 Key: SPARK-51192
 URL: https://issues.apache.org/jira/browse/SPARK-51192
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Venkata Sai Akhil Gudesa


[https://github.com/apache/spark/pull/47816] attempted to move `MockObserver` 
into source code to address compilation errors when open-source libraries 
attempt to test their command plugin extensions via the 
`SparkConnectPlannerUtils`.

However, this isn't enough as the error `{*}java.lang.NoSuchMethodError: 'void 
org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(org.apache.spark.connect.proto.Command,
 io.grpc.stub.StreamObserver`{*} continues to be seen.

To address this shading issue, we can move the creation of the `MockObserver` 
to the source code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51190) Fix TreeEnsembleModel.treeWeights

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51190.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 49919
[https://github.com/apache/spark/pull/49919]

> Fix TreeEnsembleModel.treeWeights
> -
>
> Key: SPARK-51190
> URL: https://issues.apache.org/jira/browse/SPARK-51190
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51192) Expose a ResponseObserver-free version of `process` in SparkConnectPlanner

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51192:
---
Labels: pull-request-available  (was: )

> Expose a ResponseObserver-free version of `process` in SparkConnectPlanner
> --
>
> Key: SPARK-51192
> URL: https://issues.apache.org/jira/browse/SPARK-51192
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Venkata Sai Akhil Gudesa
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/apache/spark/pull/47816] attempted to move `MockObserver` 
> into source code to address compilation errors when open-source libraries 
> attempt to test their command plugin extensions via the 
> `SparkConnectPlannerUtils`.
> However, this isn't enough as the error `{*}java.lang.NoSuchMethodError: 
> 'void 
> org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(org.apache.spark.connect.proto.Command,
>  io.grpc.stub.StreamObserver`{*} continues to be seen.
> To address this shading issue, we can move the creation of the `MockObserver` 
> to the source code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51182:
---
Labels: pull-request-available  (was: )

> DataFrameWriter should throw dataPathNotSpecifiedError when path is not 
> specified
> -
>
> Key: SPARK-51182
> URL: https://issues.apache.org/jira/browse/SPARK-51182
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Vlad Rozov
>Priority: Minor
>  Labels: pull-request-available
>
> When {{path}} is not specified in the call to 
> {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, 
> {{parquet(path)}} and etc, it will be more accurate to raise 
> {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified

2025-02-12 Thread Wei Guo (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-51182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926655#comment-17926655
 ] 

Wei Guo commented on SPARK-51182:
-

I made a PR for this.

> DataFrameWriter should throw dataPathNotSpecifiedError when path is not 
> specified
> -
>
> Key: SPARK-51182
> URL: https://issues.apache.org/jira/browse/SPARK-51182
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.1.0
>Reporter: Vlad Rozov
>Priority: Minor
>  Labels: pull-request-available
>
> When {{path}} is not specified in the call to 
> {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, 
> {{parquet(path)}} and etc, it will be more accurate to raise 
> {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51059) Document how ALLOWED_ATTRIBUTES works

2025-02-12 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-51059:
-

Assignee: Ruifeng Zheng

> Document how ALLOWED_ATTRIBUTES works
> -
>
> Key: SPARK-51059
> URL: https://issues.apache.org/jira/browse/SPARK-51059
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51198) Revise `defaultMinPartitions` function description

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51198:
-

Assignee: Dongjoon Hyun

> Revise `defaultMinPartitions` function description
> --
>
> Key: SPARK-51198
> URL: https://issues.apache.org/jira/browse/SPARK-51198
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51200) Add SparkR deprecation info to `README.md` and `make-distribution.sh` help

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51200:
---
Labels: pull-request-available  (was: )

> Add SparkR deprecation info to `README.md` and `make-distribution.sh` help
> --
>
> Key: SPARK-51200
> URL: https://issues.apache.org/jira/browse/SPARK-51200
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51200) Add SparkR deprecation info to `README.md` and `make-distribution.sh` help

2025-02-12 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-51200:
-

 Summary: Add SparkR deprecation info to `README.md` and 
`make-distribution.sh` help
 Key: SPARK-51200
 URL: https://issues.apache.org/jira/browse/SPARK-51200
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51190) Fix TreeEnsembleModel.treeWeights

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51190:
---
Labels: pull-request-available  (was: )

> Fix TreeEnsembleModel.treeWeights
> -
>
> Key: SPARK-51190
> URL: https://issues.apache.org/jira/browse/SPARK-51190
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-51188) Upgrade Arrow to 18.2.0

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-51188.
---
Fix Version/s: 4.1.0
   Resolution: Fixed

Issue resolved by pull request 49904
[https://github.com/apache/spark/pull/49904]

> Upgrade Arrow to 18.2.0
> ---
>
> Key: SPARK-51188
> URL: https://issues.apache.org/jira/browse/SPARK-51188
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51188) Upgrade Arrow to 18.2.0

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51188:
-

Assignee: Yang Jie

> Upgrade Arrow to 18.2.0
> ---
>
> Key: SPARK-51188
> URL: https://issues.apache.org/jira/browse/SPARK-51188
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51188) Upgrade Arrow to 18.2.0

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51188:
--
Parent: SPARK-51166
Issue Type: Sub-task  (was: Improvement)

> Upgrade Arrow to 18.2.0
> ---
>
> Key: SPARK-51188
> URL: https://issues.apache.org/jira/browse/SPARK-51188
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-50812) Support pyspark.ml on Connect

2025-02-12 Thread Ruifeng Zheng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926662#comment-17926662
 ] 

Ruifeng Zheng commented on SPARK-50812:
---

Thank you [~dongjoon] !

> Support pyspark.ml on Connect
> -
>
> Key: SPARK-50812
> URL: https://issues.apache.org/jira/browse/SPARK-50812
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Bobby Wang
>Priority: Major
>  Labels: releasenotes
> Fix For: 4.0.0
>
>
> Starting from Apache Spark 3.4, Spark has supported Connect which introduced 
> a decoupled client-server architecture that allows remote connectivity to 
> Spark clusters using the DataFrame API and unresolved logical plans as the 
> protocol. The separation between client and server allows Spark and its open 
> ecosystem to be leveraged from everywhere. It can be embedded in modern data 
> applications, in IDEs, Notebooks and programming languages.
> However, Spark Connect currently only supports Spark SQL, which means Spark 
> ML could not run the training/inference via Spark Connect. It will probably 
> result in losing some ML users.
> So I would like to propose a way to support Spark ML on the Connect. Users 
> don't need to change their code to leverage connect to run Spark ML cases.
> Here are some links,
> Design doc: [Support spark.ml on 
> Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing]
>  
> Draft PR: [https://github.com/wbo4958/spark/pull/5]
> Example code,
> {code:python}
> spark = SparkSession.builder.remote("sc://localhost").getOrCreate()
> df = spark.createDataFrame([
> (Vectors.dense([1.0, 2.0]), 1), 
> (Vectors.dense([2.0, -1.0]), 1), 
> (Vectors.dense([-3.0, -2.0]), 0), 
> (Vectors.dense([-1.0, -2.0]), 0), 
> ], schema=['features', 'label'])
> lr = LogisticRegression()
> lr.setMaxIter(30)
> model: LogisticRegressionModel = lr.fit(df)
> z = model.summary
> x = model.predictRaw(Vectors.dense([1.0, 2.0]))
> print(f"predictRaw {x}")
> assert model.getMaxIter() == 30
> model.summary.roc.show()
> print(model.summary.weightedRecall)
> print(model.summary.recallByLabel)
> print(model.coefficients)
> print(model.intercept)
> model.transform(df).show()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51195) Upgrade `kubernetes-client` to 7.1.0

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51195:
---
Labels: pull-request-available  (was: )

> Upgrade `kubernetes-client` to 7.1.0
> 
>
> Key: SPARK-51195
> URL: https://issues.apache.org/jira/browse/SPARK-51195
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, k8s
>Affects Versions: 4.1.0
>Reporter: Wei Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51195) Upgrade `kubernetes-client` to 7.1.0

2025-02-12 Thread Wei Guo (Jira)

Wei Guo created SPARK-51195:
---

 Summary: Upgrade `kubernetes-client` to 7.1.0
 Key: SPARK-51195
 URL: https://issues.apache.org/jira/browse/SPARK-51195
 Project: Spark
  Issue Type: Improvement
  Components: Build, k8s
Affects Versions: 4.1.0
Reporter: Wei Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51195) Upgrade `kubernetes-client` to 7.1.0

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51195:
--
Parent: SPARK-51166
Issue Type: Sub-task  (was: Improvement)

> Upgrade `kubernetes-client` to 7.1.0
> 
>
> Key: SPARK-51195
> URL: https://issues.apache.org/jira/browse/SPARK-51195
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, k8s
>Affects Versions: 4.1.0
>Reporter: Wei Guo
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51196) Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2047` and `_LEGACY_ERROR_TEMP_2050`

2025-02-12 Thread Wei Guo (Jira)

Wei Guo created SPARK-51196:
---

 Summary: Assign appropriate error condition for 
`_LEGACY_ERROR_TEMP_2047` and `_LEGACY_ERROR_TEMP_2050`
 Key: SPARK-51196
 URL: https://issues.apache.org/jira/browse/SPARK-51196
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.1.0
Reporter: Wei Guo






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-48163) Fix Flaky Test: `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-48163:
--
Parent Issue: SPARK-51166  (was: SPARK-44111)

> Fix Flaky Test: `SparkConnectServiceSuite.SPARK-43923: commands send events - 
> get_resources_command`
> 
>
> Key: SPARK-48163
> URL: https://issues.apache.org/jira/browse/SPARK-48163
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> This is a long standing flakiness from early 2024 to now.
> - https://github.com/apache/spark/actions/runs/12882534288/job/35914995457 
> (2025-01-21)
> {code}
> - SPARK-43923: commands send events ((get_resources_command {
> [info] }
> [info] ,None)) *** FAILED *** (35 milliseconds)
> [info]   VerifyEvents.this.listener.executeHolder.isDefined was false 
> (SparkConnectServiceSuite.scala:873)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51183) Update spec to point to Parquet

2025-02-12 Thread David Cashman (Jira)

David Cashman created SPARK-51183:
-

 Summary: Update spec to point to Parquet
 Key: SPARK-51183
 URL: https://issues.apache.org/jira/browse/SPARK-51183
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.1
Reporter: David Cashman


The shredding spec has moved to Parquet, and the version in Spark is out of 
date relative to the code. We should update to point to the Parquet spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51184) Remove `TaskState.LOST` logic from `TaskSchedulerImpl`

2025-02-12 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-51184:
-

 Summary: Remove `TaskState.LOST` logic from `TaskSchedulerImpl`
 Key: SPARK-51184
 URL: https://issues.apache.org/jira/browse/SPARK-51184
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51184) Remove `TaskState.LOST` logic from `TaskSchedulerImpl`

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51184:
---
Labels: pull-request-available  (was: )

> Remove `TaskState.LOST` logic from `TaskSchedulerImpl`
> --
>
> Key: SPARK-51184
> URL: https://issues.apache.org/jira/browse/SPARK-51184
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-51184) Remove `TaskState.LOST` logic from `TaskSchedulerImpl`

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-51184:
-

Assignee: Dongjoon Hyun

> Remove `TaskState.LOST` logic from `TaskSchedulerImpl`
> --
>
> Key: SPARK-51184
> URL: https://issues.apache.org/jira/browse/SPARK-51184
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51180) Upgrade `Arrow` to 19.0.0

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51180:
--
Parent: SPARK-51166
Issue Type: Sub-task  (was: Improvement)

> Upgrade `Arrow` to 19.0.0
> -
>
> Key: SPARK-51180
> URL: https://issues.apache.org/jira/browse/SPARK-51180
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.1.0
>Reporter: Aimilios Tsouvelekakis
>Priority: Major
>  Labels: pull-request-available
>
> Current v4.0.0 planning has arrow until 18.0.0, it would be good to move it 
> to version 19.0.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified

2025-02-12 Thread Vlad Rozov (Jira)

Vlad Rozov created SPARK-51182:
--

 Summary: DataFrameWriter should throw dataPathNotSpecifiedError 
when path is not specified
 Key: SPARK-51182
 URL: https://issues.apache.org/jira/browse/SPARK-51182
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.1.0
Reporter: Vlad Rozov


When {{path}} is not specified in the call to {{DataFrame.write().save(path)}} 
explicitly or using {{option(path, ...)}}, {{parquet(path)}} and etc, it will 
be more accurate to raise {{dataPathNotSpecifiedError}} instead of 
{{multiplePathsSpecifiedError}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51183) Update spec to point to Parquet

2025-02-12 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-51183:
---
Labels: pull-request-available  (was: )

> Update spec to point to Parquet
> ---
>
> Key: SPARK-51183
> URL: https://issues.apache.org/jira/browse/SPARK-51183
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.1
>Reporter: David Cashman
>Priority: Major
>  Labels: pull-request-available
>
> The shredding spec has moved to Parquet, and the version in Spark is out of 
> date relative to the code. We should update to point to the Parquet spec.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51008) Implement Result Stage for AQE

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51008:
--
Parent: SPARK-44111
Issue Type: Sub-task  (was: Improvement)

> Implement Result Stage for AQE
> --
>
> Key: SPARK-51008
> URL: https://issues.apache.org/jira/browse/SPARK-51008
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ziqi Liu
>Assignee: Ziqi Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To support 
> [https://github.com/apache/spark/pull/44013#issuecomment-2421167393] we need 
> to implement Result Stage for AQE so that all plan segment can fall into a 
> stage context. This would also improve the AQE flow to a more self-contained 
> state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-51042) Read and write CalendarIntervals using one call to get/putLong consistently

2025-02-12 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-51042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-51042:
--
Fix Version/s: 3.5.5

> Read and write CalendarIntervals using one call to get/putLong consistently
> ---
>
> Key: SPARK-51042
> URL: https://issues.apache.org/jira/browse/SPARK-51042
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.4, 3.5.5, 4.1.0
>Reporter: Jonathan Albrecht
>Assignee: Jonathan Albrecht
>Priority: Minor
>  Labels: big-endian, pull-request-available
> Fix For: 4.0.0, 3.5.5
>
>
> In commit ac07cea234f4fb687442aafa8b6d411695110a4e there was a performance 
> improvement to reading a writing CalendarIntervals in UnsafeRow. This same 
> change can be applied to UnsafeArrayData and UnsafeWriter.
> This would also fix big endian platforms where the current and proposed 
> methods of reading and writing CalendarIntervals do not order the bytes in 
> the same way. Currently CalendarInterval related tests in Catalyst and SQL 
> are failing on big endian platforms.
> There would be no effect on little endian platforms (byte order is not 
> affected) except for performance improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 121 matches

Mail list logo