[jira] [Updated] (SPARK-51162) SPIP: Add the TIME data type
[ https://issues.apache.org/jira/browse/SPARK-51162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-51162: - Description: *Q1. What are you trying to do? Articulate your objectives using absolutely no jargon.* Add new data type *TIME* to Spark SQL which represents a time value with fields hour, minute, second, up to microseconds. All operations over the type are performed without taking any time zone into account. New data type should conform to the type *TIME\(n\) WITHOUT TIME ZONE* defined by the SQL standard where 0 <= n <= 6. *Q2. What problem is this proposal NOT designed to solve?* Don't support the TIME type with time zone defined by the SQL standard: {*}TIME\(n\) WITH TIME ZONE{*}. Also don't support TIME with local timezone. *Q3. How is it done today, and what are the limits of current practice?* The TIME type can be emulated via the TIMESTAMP_NTZ data type by setting the date part to the some constant value like 1970-01-01, 0001-01-01 or -00-00 (though this is out of supported range of dates). Although the type can be emulation via TIMESTAMP_NTZ, Spark SQL cannot recognize it in data sources, and for instance cannot load the TIME values from parquet files. *Q4. What is new in your approach and why do you think it will be successful?* The approach is not new, and we have clear picture how to split the work by sub-tasks based on our experience of adding new types ANSI intervals and TIMESTAMP_NTZ. *Q5. Who cares? If you are successful, what difference will it make?* The new type simplifies migrations to Spark SQL from other DBMS like PostgreSQL, Snowflake, Google SQL, Amazon Redshift, Teradata, DB2. Such users don't have to rewrite their SQL code to emulate the TIME type. Also new functionality impacts on existing Spark SQL users who need to load data w/ the TIME values that were stored by other systems. *Q6. What are the risks?* Additional handling new type in operators, expression and data sources can cause performance regressions. Such risk can be compensated by developing time benchmarks in parallel with supporting new type in different places in Spark SQL. *Q7. How long will it take?* In total it might take around {*}9 months{*}. The estimation is based on similar tasks: ANSI intervals (SPARK-27790) and TIMESTAMP_NTZ (SPARK-35662). We can split the work by function blocks: # Base functionality - *3 weeks* Add new type TimeType, forming/parsing time literals, type constructor, and external types. # Persistence - *3.5 months* Ability to create tables of the type TIME, read/write from/to Parquet and other built-in data types, partitioning, stats, predicate push down. # Time operators - *2 months* Arithmetic ops, field extract, sorting, and aggregations. # Clients support - *1 month* JDBC, Hive, Thrift server, connect # PySpark integration - *1 month* DataFrame support, pandas API, python UDFs, Arrow column vectors # Docs + testing/benchmarking - *1 month* *Q8. What are the mid-term and final “exams” to check for success?* The mid-term is in 4 month: basic functionality, read/write new type to built-in datasources, basic time operations such as arithmetic ops, casting. The final "exams" is to support the same functionality as other time types: TIMESTAMP_NTZ, DATE, TIMESTAMP. *Appendix A. Proposed API Changes.* Add new case class *TimeType* to {_}org.apache.spark.sql.types{_}: {code:scala} /** * The time type represents a time value with fields hour, minute, second, up to microseconds. * The range of times supported is 00:00:00.00 to 23:59:59.99. * * Please use the singleton `DataTypes.TimeType` to refer the type. */ class TimeType(precisionField: Byte) private () extends DatetimeType { /** * The default size of a value of the TimeType is 8 bytes. */ override def defaultSize: Int = 8 private[spark] override def asNullable: DateType = this } {code} *Appendix B:* As the external types for the new TIME type, we propose: - Java/Scala: [java.time.LocalTime|https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/time/LocalTime.html] - PySpark: [time|https://docs.python.org/3/library/datetime.html#time-objects] was: *Q1. What are you trying to do? Articulate your objectives using absolutely no jargon.* Add new data type *TIME* to Spark SQL which represents a time value with fields hour, minute, second, up to microseconds. All operations over the type are performed without taking any time zone into account. New data type should conform to the type *TIME\(n\) WITHOUT TIME ZONE* defined by the SQL standard where 0 <= n <= 6. *Q2. What problem is this proposal NOT designed to solve?* Don't support the TIME type with time zone defined by the SQL standard: {*}TIME\(n\) WITH TIME ZONE{*}. Also don't support TIME with local timezone. *Q3. How is it done today, and what are the limits of current practice?* The TIME type can
[jira] [Updated] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that centralized check
[ https://issues.apache.org/jira/browse/SPARK-51179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-51179: --- Summary: Refactor SupportsOrderingWithinGroup so that centralized check (was: Refactor SupportsOrderingWithinGroup so that advances the check) > Refactor SupportsOrderingWithinGroup so that centralized check > -- > > Key: SPARK-51179 > URL: https://issues.apache.org/jira/browse/SPARK-51179 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that centralized check
[ https://issues.apache.org/jira/browse/SPARK-51179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-51179: --- Description: Currently, the check in analysis for ListAgg scattered in multiple locations. We should improve it with centralized check. was: Currently, the check in analysis for ListAgg scattered in multiple locations. We should > Refactor SupportsOrderingWithinGroup so that centralized check > -- > > Key: SPARK-51179 > URL: https://issues.apache.org/jira/browse/SPARK-51179 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, the check in analysis for ListAgg scattered in multiple locations. > We should improve it with centralized check. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that centralized check
[ https://issues.apache.org/jira/browse/SPARK-51179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-51179: --- Description: Currently, the check in analysis for ListAgg scattered in multiple locations. We should > Refactor SupportsOrderingWithinGroup so that centralized check > -- > > Key: SPARK-51179 > URL: https://issues.apache.org/jira/browse/SPARK-51179 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, the check in analysis for ListAgg scattered in multiple locations. > We should -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that advances the check
Jiaan Geng created SPARK-51179: -- Summary: Refactor SupportsOrderingWithinGroup so that advances the check Key: SPARK-51179 URL: https://issues.apache.org/jira/browse/SPARK-51179 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.1.0 Reporter: Jiaan Geng Assignee: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51179) Refactor SupportsOrderingWithinGroup so that centralized check
[ https://issues.apache.org/jira/browse/SPARK-51179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51179: --- Labels: pull-request-available (was: ) > Refactor SupportsOrderingWithinGroup so that centralized check > -- > > Key: SPARK-51179 > URL: https://issues.apache.org/jira/browse/SPARK-51179 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, the check in analysis for ListAgg scattered in multiple locations. > We should improve it with centralized check. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51180) Upgrade `Arrow` to 19.0.0
[ https://issues.apache.org/jira/browse/SPARK-51180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51180: --- Labels: pull-request-available (was: ) > Upgrade `Arrow` to 19.0.0 > - > > Key: SPARK-51180 > URL: https://issues.apache.org/jira/browse/SPARK-51180 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Aimilios Tsouvelekakis >Priority: Major > Labels: pull-request-available > > Current v4.0.0 planning has arrow until 18.0.0, it would be good to move it > to version 19.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51160) Refactor literal function resolution
[ https://issues.apache.org/jira/browse/SPARK-51160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-51160. - Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 49887 [https://github.com/apache/spark/pull/49887] > Refactor literal function resolution > > > Key: SPARK-51160 > URL: https://issues.apache.org/jira/browse/SPARK-51160 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Timotic >Assignee: Mihailo Timotic >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > > Refactor literal function resolution to a separate object to enable > single-pass analyzer to reuse this logic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51160) Refactor literal function resolution
[ https://issues.apache.org/jira/browse/SPARK-51160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-51160: --- Assignee: Mihailo Timotic > Refactor literal function resolution > > > Key: SPARK-51160 > URL: https://issues.apache.org/jira/browse/SPARK-51160 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Timotic >Assignee: Mihailo Timotic >Priority: Major > Labels: pull-request-available > > Refactor literal function resolution to a separate object to enable > single-pass analyzer to reuse this logic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51180) Upgrade `Arrow` to 19.0.0
Aimilios Tsouvelekakis created SPARK-51180: -- Summary: Upgrade `Arrow` to 19.0.0 Key: SPARK-51180 URL: https://issues.apache.org/jira/browse/SPARK-51180 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Aimilios Tsouvelekakis Current v4.0.0 planning has arrow until 18.0.0, it would be good to move it to version 19.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50812) Support pyspark.ml on Connect
[ https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-50812: -- Affects Version/s: (was: 4.1.0) > Support pyspark.ml on Connect > - > > Key: SPARK-50812 > URL: https://issues.apache.org/jira/browse/SPARK-50812 > Project: Spark > Issue Type: Umbrella > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Bobby Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Starting from Apache Spark 3.4, Spark has supported Connect which introduced > a decoupled client-server architecture that allows remote connectivity to > Spark clusters using the DataFrame API and unresolved logical plans as the > protocol. The separation between client and server allows Spark and its open > ecosystem to be leveraged from everywhere. It can be embedded in modern data > applications, in IDEs, Notebooks and programming languages. > However, Spark Connect currently only supports Spark SQL, which means Spark > ML could not run the training/inference via Spark Connect. It will probably > result in losing some ML users. > So I would like to propose a way to support Spark ML on the Connect. Users > don't need to change their code to leverage connect to run Spark ML cases. > Here are some links, > Design doc: [Support spark.ml on > Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing] > > Draft PR: [https://github.com/wbo4958/spark/pull/5] > Example code, > {code:python} > spark = SparkSession.builder.remote("sc://localhost").getOrCreate() > df = spark.createDataFrame([ > (Vectors.dense([1.0, 2.0]), 1), > (Vectors.dense([2.0, -1.0]), 1), > (Vectors.dense([-3.0, -2.0]), 0), > (Vectors.dense([-1.0, -2.0]), 0), > ], schema=['features', 'label']) > lr = LogisticRegression() > lr.setMaxIter(30) > model: LogisticRegressionModel = lr.fit(df) > z = model.summary > x = model.predictRaw(Vectors.dense([1.0, 2.0])) > print(f"predictRaw {x}") > assert model.getMaxIter() == 30 > model.summary.roc.show() > print(model.summary.weightedRecall) > print(model.summary.recallByLabel) > print(model.coefficients) > print(model.intercept) > model.transform(df).show() > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51163) Exclude duplicated jars from connect-repl
[ https://issues.apache.org/jira/browse/SPARK-51163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51163: - Assignee: Cheng Pan > Exclude duplicated jars from connect-repl > - > > Key: SPARK-51163 > URL: https://issues.apache.org/jira/browse/SPARK-51163 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51163) Exclude duplicated jars from connect-repl
[ https://issues.apache.org/jira/browse/SPARK-51163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51163. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 49892 [https://github.com/apache/spark/pull/49892] > Exclude duplicated jars from connect-repl > - > > Key: SPARK-51163 > URL: https://issues.apache.org/jira/browse/SPARK-51163 > Project: Spark > Issue Type: Improvement > Components: Build, Connect >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51157) Add missing @varargs Scala annotation for scala functon APIs
[ https://issues.apache.org/jira/browse/SPARK-51157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51157: -- Fix Version/s: 3.5.5 > Add missing @varargs Scala annotation for scala functon APIs > > > Key: SPARK-51157 > URL: https://issues.apache.org/jira/browse/SPARK-51157 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.5 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51181) Enforce determinism when pulling out non deterministic expressions from logical plan
Mihailo Aleksic created SPARK-51181: --- Summary: Enforce determinism when pulling out non deterministic expressions from logical plan Key: SPARK-51181 URL: https://issues.apache.org/jira/browse/SPARK-51181 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.1.0 Reporter: Mihailo Aleksic Enforce determinism when pulling out non deterministic expressions from logical plan to avoid plan normalization problem when comparing single-pass and fixed point results. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51163) Exclude duplicated jars from connect-repl
[ https://issues.apache.org/jira/browse/SPARK-51163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51163: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Exclude duplicated jars from connect-repl > - > > Key: SPARK-51163 > URL: https://issues.apache.org/jira/browse/SPARK-51163 > Project: Spark > Issue Type: Sub-task > Components: Build, Connect >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51008) Implement Result Stage for AQE
[ https://issues.apache.org/jira/browse/SPARK-51008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-51008. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 49715 [https://github.com/apache/spark/pull/49715] > Implement Result Stage for AQE > -- > > Key: SPARK-51008 > URL: https://issues.apache.org/jira/browse/SPARK-51008 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Assignee: Ziqi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > To support > [https://github.com/apache/spark/pull/44013#issuecomment-2421167393] we need > to implement Result Stage for AQE so that all plan segment can fall into a > stage context. This would also improve the AQE flow to a more self-contained > state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51008) Implement Result Stage for AQE
[ https://issues.apache.org/jira/browse/SPARK-51008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-51008: --- Assignee: Ziqi Liu > Implement Result Stage for AQE > -- > > Key: SPARK-51008 > URL: https://issues.apache.org/jira/browse/SPARK-51008 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Assignee: Ziqi Liu >Priority: Major > Labels: pull-request-available > > To support > [https://github.com/apache/spark/pull/44013#issuecomment-2421167393] we need > to implement Result Stage for AQE so that all plan segment can fall into a > stage context. This would also improve the AQE flow to a more self-contained > state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE
[ https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51113: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Bug) > Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE > IMMEDIATE > > > Key: SPARK-51113 > URL: https://issues.apache.org/jira/browse/SPARK-51113 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Priority: Critical > Labels: pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png > > > There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries > inside views a keyword is considered an alias: > ``` > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION > SELECT 3 UNION SELECT 4") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 > EXCEPT SELECT 1 EXCEPT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 > INTERSECT SELECT 2 INTERSECT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > ``` > !screenshot-1.png! > !screenshot-3.png! > !screenshot-4.png! > Same issue for `EXECUTE IMMEDIATE`: > ``` > spark.sql("DECLARE v INT") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO > v").queryExecution.analyzed > spark.sql("SELECT v").show() > ``` > !screenshot-5.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50889) Fix Flaky Test: `SparkSessionE2ESuite.interrupt operation` (Hang)
[ https://issues.apache.org/jira/browse/SPARK-50889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-50889: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Fix Flaky Test: `SparkSessionE2ESuite.interrupt operation` (Hang) > - > > Key: SPARK-50889 > URL: https://issues.apache.org/jira/browse/SPARK-50889 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > `SparkSessionE2ESuite.interrupt operation` hangs sometimes. > - `branch-4.0`: > https://github.com/apache/spark/actions/runs/12848096505/job/35829436740 > - `branch-4.0`: > https://github.com/apache/spark/actions/runs/12951559619/job/36126910293 > {code} > [info] SparkSessionE2ESuite: > [info] - interrupt all - background queries, foreground interrupt (217 > milliseconds) > [info] - interrupt all - foreground queries, background interrupt (306 > milliseconds) > [info] - interrupt all - streaming queries (381 milliseconds) > [info] - interrupt tag !!! IGNORED !!! > [info] - interrupt tag - streaming query (776 milliseconds) > [info] - progress is available for the spark result (2 seconds, 991 > milliseconds) > [info] *** Test still running after 5 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 10 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 15 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 20 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 25 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 30 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 35 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 40 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 45 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 50 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 55 minutes, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 1 hour, 59 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 1 hour, 5 minutes, 59 seconds: suite > name: SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 1 hour, 10 minutes, 59 seconds: suite > name: SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 1 hour, 15 minutes, 59 seconds: suite > name: SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 1 hour, 20 minutes, 59 seconds: suite > name: SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 1 hour, 25 minutes, 59 seconds: suite > name: SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 1 hour, 30 minutes, 59 seconds: suite > name: SparkSessionE2ESuite, test name: interrupt operation. > {code} > - `master: > https://github.com/apache/spark/actions/runs/12804420645/job/35698812313 > {code} > [info] SparkSessionE2ESuite: > [info] - interrupt all - background queries, foreground interrupt (221 > milliseconds) > [info] - interrupt all - foreground queries, background interrupt (307 > milliseconds) > [info] - interrupt all - streaming queries (394 milliseconds) > [info] - interrupt tag !!! IGNORED !!! > [info] - interrupt tag - streaming query (788 milliseconds) > [info] - progress is available for the spark result (3 seconds, 990 > milliseconds) > [info] *** Test still running after 5 minutes, 51 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 10 minutes, 51 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 15 minutes, 51 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 20 minutes, 51 seconds: suite name: > SparkSessionE2ESuite, test name: interrupt operation. > [info] *** Test still running after 25 minutes, 51 seconds: sui
[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`
[ https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48139: -- Priority: Critical (was: Blocker) > Re-enable `SparkSessionE2ESuite.interrupt tag` > -- > > Key: SPARK-48139 > URL: https://issues.apache.org/jira/browse/SPARK-48139 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Dongjoon Hyun >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`
[ https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48139: -- Target Version/s: (was: 4.0.0) > Re-enable `SparkSessionE2ESuite.interrupt tag` > -- > > Key: SPARK-48139 > URL: https://issues.apache.org/jira/browse/SPARK-48139 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Dongjoon Hyun >Priority: Blocker > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-50205) Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in SparkSession are isolated`
[ https://issues.apache.org/jira/browse/SPARK-50205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926431#comment-17926431 ] Dongjoon Hyun commented on SPARK-50205: --- I moved this to 4.1.0. > Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in > SparkSession are isolated` > - > > Key: SPARK-50205 > URL: https://issues.apache.org/jira/browse/SPARK-50205 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Pengfei Xu >Priority: Critical > Labels: pull-request-available > > https://github.com/apache/spark/actions/runs/10915451051/job/30295259985 > This test case needs a refactor to use only 2 threads instead of 3, because > having 3 threads is not guaranteed in CI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48139) Re-enable `SparkSessionE2ESuite.interrupt tag`
[ https://issues.apache.org/jira/browse/SPARK-48139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48139: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Re-enable `SparkSessionE2ESuite.interrupt tag` > -- > > Key: SPARK-48139 > URL: https://issues.apache.org/jira/browse/SPARK-48139 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Dongjoon Hyun >Priority: Critical > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50771) Fix a flaky test: BlockInfoManagerSuite.SPARK-38675 - concurrent unlock and releaseAllLocksForTask calls should not fail
[ https://issues.apache.org/jira/browse/SPARK-50771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-50771: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Fix a flaky test: BlockInfoManagerSuite.SPARK-38675 - concurrent unlock and > releaseAllLocksForTask calls should not fail > > > Key: SPARK-50771 > URL: https://issues.apache.org/jira/browse/SPARK-50771 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Minor > > https://github.com/apache/spark/actions/runs/12666965730/job/35299446885 > {code} > [info] - SPARK-38675 - concurrent unlock and releaseAllLocksForTask calls > should not fail *** FAILED *** (2 milliseconds) > [info] java.lang.AssertionError: assertion failed > [info] at scala.Predef$.assert(Predef.scala:264) > [info] at > org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:89) > [info] at > org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:71) > [info] at > org.apache.spark.storage.BlockInfoManager.$anonfun$releaseAllLocksForTask$6(BlockInfoManager.scala:498) > [info] at > org.apache.spark.storage.BlockInfoManager.$anonfun$releaseAllLocksForTask$6$adapted(BlockInfoManager.scala:497) > [info] at > org.apache.spark.storage.BlockInfoWrapper.withLock(BlockInfoManager.scala:105) > [info] at > org.apache.spark.storage.BlockInfoManager.blockInfo(BlockInfoManager.scala:271) > [info] at > org.apache.spark.storage.BlockInfoManager.$anonfun$releaseAllLocksForTask$5(BlockInfoManager.scala:497) > [info] at java.base/java.lang.Iterable.forEach(Iterable.java:75) > [info] at > org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:493) > [info] at > org.apache.spark.storage.BlockInfoManagerSuite.$anonfun$new$82(BlockInfoManagerSuite.scala:399) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > [info] at > org.apache.spark.storage.BlockInfoManagerSuite.withTaskId(BlockInfoManagerSuite.scala:66) > [info] at > org.apache.spark.storage.BlockInfoManagerSuite.$anonfun$new$81(BlockInfoManagerSuite.scala:385) > [info] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > [info] at > org.apache.spark.storage.BlockInfoManagerSuite.$anonfun$new$80(BlockInfoManagerSuite.scala:384) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > [info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > [info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > [info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > [info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) > [info] at > org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) > [info] at > org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) > [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:69) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) > [info] at > org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) > [info] at scala.collection.immutable.List.foreach(List.scala:334) > [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > [info] at org.scalatest.Su
[jira] [Updated] (SPARK-50205) Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in SparkSession are isolated`
[ https://issues.apache.org/jira/browse/SPARK-50205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-50205: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in > SparkSession are isolated` > - > > Key: SPARK-50205 > URL: https://issues.apache.org/jira/browse/SPARK-50205 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Pengfei Xu >Priority: Critical > Labels: pull-request-available > > https://github.com/apache/spark/actions/runs/10915451051/job/30295259985 > This test case needs a refactor to use only 2 threads instead of 3, because > having 3 threads is not guaranteed in CI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51019) Fix Flaky Test: `SPARK-47148: AQE should avoid to submit shuffle job on cancellation`
[ https://issues.apache.org/jira/browse/SPARK-51019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51019: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Fix Flaky Test: `SPARK-47148: AQE should avoid to submit shuffle job on > cancellation` > - > > Key: SPARK-51019 > URL: https://issues.apache.org/jira/browse/SPARK-51019 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > - https://github.com/apache/spark/actions/runs/13004225714/job/36268222928 > {code} > == Parsed Logical Plan == > 'Join UsingJoin(Inner, [id]) > :- Project [id#133801L, scalarsubquery()#133805] > : +- Join Inner, (id#133801L = id#133806L) > : :- Project [id#133801L, scalar-subquery#133800 [] AS > scalarsubquery()#133805] > : : : +- Project [slow_udf() AS slow_udf()#133804] > : : : +- Range (0, 2, step=1) > : : +- Range (0, 5, step=1) > : +- Repartition 2, false > :+- Project [id#133806L] > : +- Range (0, 10, step=1) > +- Project [id#133808L, scalar-subquery#133807 [] AS scalarsubquery()#133812] >: +- Project [slow_udf() AS slow_udf()#133811] >: +- Range (0, 2, step=1) >+- Filter (id#133808L > cast(2 as bigint)) > +- Range (0, 15, step=1) > == Analyzed Logical Plan == > id: bigint, scalarsubquery(): int, scalarsubquery(): int > Project [id#133801L, scalarsubquery()#133805, scalarsubquery()#133812] > +- Join Inner, (id#133801L = id#133808L) >:- Project [id#133801L, scalarsubquery()#133805] >: +- Join Inner, (id#133801L = id#133806L) >: :- Project [id#133801L, scalar-subquery#133800 [] AS > scalarsubquery()#133805] >: : : +- Project [slow_udf() AS slow_udf()#133804] >: : : +- Range (0, 2, step=1) >: : +- Range (0, 5, step=1) >: +- Repartition 2, false >:+- Project [id#133806L] >: +- Range (0, 10, step=1) >+- Project [id#133808L, scalar-subquery#133807 [] AS > scalarsubquery()#133812] > : +- Project [slow_udf() AS slow_udf()#133811] > : +- Range (0, 2, step=1) > +- Filter (id#133808L > cast(2 as bigint)) > +- Range (0, 15, step=1) > == Optimized Logical Plan == > Project [id#133801L, scalarsubquery()#133805, scalarsubquery()#133812] > +- Join Inner, (id#133801L = id#133808L) >:- Project [id#133801L, scalarsubquery()#133805] >: +- Join Inner, (id#133801L = id#133806L) >: :- Project [id#133801L, scalar-subquery#133800 [] AS > scalarsubquery()#133805] >: : : +- Project [slow_udf() AS slow_udf()#133804] >: : : +- Range (0, 2, step=1) >: : +- Filter (id#133801L > 2) >: : +- Range (0, 5, step=1) >: +- Repartition 2, false >:+- Range (0, 10, step=1) >+- Project [id#133808L, scalar-subquery#133807 [] AS > scalarsubquery()#133812] > : +- Project [slow_udf() AS slow_udf()#133804] > : +- Range (0, 2, step=1) > +- Filter (id#133808L > 2) > +- Range (0, 15, step=1) > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- Project [id#133801L, scalarsubquery()#133805, scalarsubquery()#133812] >+- SortMergeJoin [id#133801L], [id#133808L], Inner > :- Project [id#133801L, scalarsubquery()#133805] > : +- SortMergeJoin [id#133801L], [id#133806L], Inner > : :- Sort [id#133801L ASC NULLS FIRST], false, 0 > : : +- Exchange hashpartitioning(id#133801L, 5), > ENSURE_REQUIREMENTS, [plan_id=423273] > : : +- Project [id#133801L, Subquery subquery#133800, > [id=#423258] AS scalarsubquery()#133805] > : :: +- Subquery subquery#133800, [id=#423258] > : :: +- AdaptiveSparkPlan isFinalPlan=false > : ::+- Project [slow_udf() AS slow_udf()#133804] > : :: +- Range (0, 2, step=1, splits=2) > : :+- Filter (id#133801L > 2) > : : +- Range (0, 5, step=1, splits=2) > : +- Sort [id#133806L ASC NULLS FIRST], false, 0 > :+- Exchange hashpartitioning(id#133806L, 5), > ENSURE_REQUIREMENTS, [plan_id=423272] > : +- TestProblematicCoalesce 2 > : +- Range (0, 10, step=1, splits=2) > +- Sort [id#133808L ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(id#133808L, 5), ENSURE_REQUIREMENTS, > [plan_id=423284] > +- Project [id#133808L, Subquery subquery#133807, [id=#423262] AS > scalarsubquery()#133812] >: +- Subquery subquery#133807, [id=#423262] >: +- AdaptiveSparkPlan isFinalPlan=false >:+-
[jira] [Updated] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE
[ https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51113: -- Priority: Blocker (was: Critical) > Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE > IMMEDIATE > > > Key: SPARK-51113 > URL: https://issues.apache.org/jira/browse/SPARK-51113 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Priority: Blocker > Labels: pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png > > > There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries > inside views a keyword is considered an alias: > ``` > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION > SELECT 3 UNION SELECT 4") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 > EXCEPT SELECT 1 EXCEPT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 > INTERSECT SELECT 2 INTERSECT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > ``` > !screenshot-1.png! > !screenshot-3.png! > !screenshot-4.png! > Same issue for `EXECUTE IMMEDIATE`: > ``` > spark.sql("DECLARE v INT") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO > v").queryExecution.analyzed > spark.sql("SELECT v").show() > ``` > !screenshot-5.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE
[ https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51113: -- Target Version/s: 4.0.0 > Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE > IMMEDIATE > > > Key: SPARK-51113 > URL: https://issues.apache.org/jira/browse/SPARK-51113 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Priority: Blocker > Labels: pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png > > > There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries > inside views a keyword is considered an alias: > ``` > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION > SELECT 3 UNION SELECT 4") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 > EXCEPT SELECT 1 EXCEPT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 > INTERSECT SELECT 2 INTERSECT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > ``` > !screenshot-1.png! > !screenshot-3.png! > !screenshot-4.png! > Same issue for `EXECUTE IMMEDIATE`: > ``` > spark.sql("DECLARE v INT") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO > v").queryExecution.analyzed > spark.sql("SELECT v").show() > ``` > !screenshot-5.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51046) `SubExprEliminationBenchmark` fails at `CodeGenerator`
[ https://issues.apache.org/jira/browse/SPARK-51046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51046: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > `SubExprEliminationBenchmark` fails at `CodeGenerator` > -- > > Key: SPARK-51046 > URL: https://issues.apache.org/jira/browse/SPARK-51046 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > Running org.apache.spark.sql.execution.SubExprEliminationBenchmark: > ... > reparing data for benchmarking ... > Running benchmark: from_json as subExpr in Filter > Running case: subExprElimination false, codegen: true > 25/01/30 22:24:08 ERROR CodeGenerator: Failed to compile the generated Java > code. > org.codehaus.commons.compiler.InternalCompilerException: Compiling > "GeneratedClass" in File 'generated.java', Line 1, Column 1: File > 'generated.java', Line 24, Column 16: Compiling "processNext()" > ... > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50205) Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in SparkSession are isolated`
[ https://issues.apache.org/jira/browse/SPARK-50205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-50205: -- Target Version/s: (was: 4.0.0) > Re-enable `SparkSessionJobTaggingAndCancellationSuite.Cancellation APIs in > SparkSession are isolated` > - > > Key: SPARK-50205 > URL: https://issues.apache.org/jira/browse/SPARK-50205 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0, 3.5.2 >Reporter: Pengfei Xu >Priority: Critical > Labels: pull-request-available > > https://github.com/apache/spark/actions/runs/10915451051/job/30295259985 > This test case needs a refactor to use only 2 threads instead of 3, because > having 3 threads is not guaranteed in CI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-49586) Add addArtifact API to PySpark
[ https://issues.apache.org/jira/browse/SPARK-49586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-49586: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Add addArtifact API to PySpark > -- > > Key: SPARK-49586 > URL: https://issues.apache.org/jira/browse/SPARK-49586 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Pengfei Xu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50888) Fix Flaky Test: `SparkConnectServiceSuite.SPARK-44776: LocalTableScanExe`
[ https://issues.apache.org/jira/browse/SPARK-50888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-50888: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Fix Flaky Test: `SparkConnectServiceSuite.SPARK-44776: LocalTableScanExe` > - > > Key: SPARK-50888 > URL: https://issues.apache.org/jira/browse/SPARK-50888 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > - `branch-4.0`: > https://github.com/apache/spark/actions/runs/12879810930/job/35907876872 > - `branch-4.0`: > https://github.com/apache/spark/actions/runs/12848096505/job/35825096617 > {code} > [info] SparkConnectServiceSuite: > [info] - Test schema in analyze response (92 milliseconds) > [info] - SPARK-41224: collect data using arrow (101 milliseconds) > [info] - SPARK-44776: LocalTableScanExec *** FAILED *** (34 milliseconds) > [info] VerifyEvents.this.executeHolder.eventsManager.hasError.isDefined was > false (SparkConnectServiceSuite.scala:895) > [info] org.scalatest.exceptions.TestFailedException: > [info] at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > [info] at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > [info] at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > [info] at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > [info] at > org.apache.spark.sql.connect.planner.SparkConnectServiceSuite$VerifyEvents.onError(SparkConnectServiceSuite.scala:895) > [info] at > org.apache.spark.sql.connect.planner.SparkConnectServiceSuite$$anon$2.onError(SparkConnectServiceSuite.scala:292) > [info] at > org.apache.spark.sql.connect.utils.ErrorUtils$$anonfun$handleError$1.applyOrElse(ErrorUtils.scala:329) > [info] at > org.apache.spark.sql.connect.utils.ErrorUtils$$anonfun$handleError$1.applyOrElse(ErrorUtils.scala:304) > [info] at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35) > [info] at scala.PartialFunction$Combined.apply(PartialFunction.scala:301) > [info] at > org.apache.spark.sql.connect.service.SparkConnectService.executePlan(SparkConnectService.scala:75) > [info] at > org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.$anonfun$new$14(SparkConnectServiceSuite.scala:285) > [info] at > org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.$anonfun$new$14$adapted(SparkConnectServiceSuite.scala:249) > [info] at > org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.$anonfun$withEvents$1(SparkConnectServiceSuite.scala:853) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > [info] at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:80) > [info] at > org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:77) > [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) > [info] at > org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.withEvents(SparkConnectServiceSuite.scala:856) > [info] at > org.apache.spark.sql.connect.planner.SparkConnectServiceSuite.$anonfun$new$13(SparkConnectServiceSuite.scala:249) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > [info] at org.scalatest.enablers.Timed$$anon$1.timeoutAfter(Timed.scala:127) > [info] at > org.scalatest.concurrent.TimeLimits$.failAfterImpl(TimeLimits.scala:282) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter(TimeLimits.scala:231) > [info] at > org.scalatest.concurrent.TimeLimits.failAfter$(TimeLimits.scala:230) > [info] at org.apache.spark.SparkFunSuite.failAfter(SparkFunSuite.scala:69) > [info] at > org.apache.spark.SparkFunSuite.$anonfun$test$2(SparkFunSuite.scala:155) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:227) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFun
[jira] [Updated] (SPARK-50748) Fix a flaky test: `SparkSessionE2ESuite.interrupt all - background queries, foreground interrupt`
[ https://issues.apache.org/jira/browse/SPARK-50748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-50748: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Fix a flaky test: `SparkSessionE2ESuite.interrupt all - background queries, > foreground interrupt` > - > > Key: SPARK-50748 > URL: https://issues.apache.org/jira/browse/SPARK-50748 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > https://github.com/apache/spark/actions/runs/12627485924/job/35182190161 > (2025-01-06) > {code} > [info] SparkSessionE2ESuite: > [info] - interrupt all - background queries, foreground interrupt *** FAILED > *** (20 seconds, 63 milliseconds) > [info] The code passed to eventually never returned normally. Attempted 30 > times over 20.057432362 seconds. Last failure message: q1Interrupted was > false. (SparkSessionE2ESuite.scala:71) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51180) Upgrade `Arrow` to 19.0.0
[ https://issues.apache.org/jira/browse/SPARK-51180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aimilios Tsouvelekakis updated SPARK-51180: --- Affects Version/s: 4.1.0 (was: 4.0.0) > Upgrade `Arrow` to 19.0.0 > - > > Key: SPARK-51180 > URL: https://issues.apache.org/jira/browse/SPARK-51180 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.1.0 >Reporter: Aimilios Tsouvelekakis >Priority: Major > Labels: pull-request-available > > Current v4.0.0 planning has arrow until 18.0.0, it would be good to move it > to version 19.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE
[ https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-51113: --- Assignee: Vladimir Golubev > Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE > IMMEDIATE > > > Key: SPARK-51113 > URL: https://issues.apache.org/jira/browse/SPARK-51113 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Blocker > Labels: pull-request-available > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png > > > There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries > inside views a keyword is considered an alias: > ``` > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION > SELECT 3 UNION SELECT 4") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 > EXCEPT SELECT 1 EXCEPT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 > INTERSECT SELECT 2 INTERSECT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > ``` > !screenshot-1.png! > !screenshot-3.png! > !screenshot-4.png! > Same issue for `EXECUTE IMMEDIATE`: > ``` > spark.sql("DECLARE v INT") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO > v").queryExecution.analyzed > spark.sql("SELECT v").show() > ``` > !screenshot-5.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51113) Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE IMMEDIATE
[ https://issues.apache.org/jira/browse/SPARK-51113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-51113. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 49835 [https://github.com/apache/spark/pull/49835] > Correctness issue with UNION/EXCEPT/INTERSECT inside a view or EXECUTE > IMMEDIATE > > > Key: SPARK-51113 > URL: https://issues.apache.org/jira/browse/SPARK-51113 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Vladimir Golubev >Assignee: Vladimir Golubev >Priority: Blocker > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png, screenshot-5.png > > > There's a parser issue where for trivial UNION/EXCEPT/INTERSECT queries > inside views a keyword is considered an alias: > ``` > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 UNION SELECT 2 UNION > SELECT 3 UNION SELECT 4") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW v1 AS SELECT 1 AS col1 EXCEPT SELECT 2 > EXCEPT SELECT 1 EXCEPT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > spark.sql("CREATE OR REPLACE VIEW t1 AS SELECT 1 AS col1 INTERSECT SELECT 1 > INTERSECT SELECT 2 INTERSECT SELECT 2") > spark.sql("SELECT * FROM v1").show() > spark.sql("SELECT * FROM v1").queryExecution.analyzed > ``` > !screenshot-1.png! > !screenshot-3.png! > !screenshot-4.png! > Same issue for `EXECUTE IMMEDIATE`: > ``` > spark.sql("DECLARE v INT") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO v") > spark.sql("EXECUTE IMMEDIATE 'SELECT 1 UNION SELECT 2 UNION SELECT 3' INTO > v").queryExecution.analyzed > spark.sql("SELECT v").show() > ``` > !screenshot-5.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51187) Implement the graceful deprecation of incorrect config introduced in SPARK-49699
Jungtaek Lim created SPARK-51187: Summary: Implement the graceful deprecation of incorrect config introduced in SPARK-49699 Key: SPARK-51187 URL: https://issues.apache.org/jira/browse/SPARK-51187 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 3.5.4, 4.0.0 Reporter: Jungtaek Lim See the comments in this PR [https://github.com/apache/spark/pull/49905] to find rationale. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51191) Validate default values handling in DELETE, UPDATE, MERGE
Anton Okolnychyi created SPARK-51191: Summary: Validate default values handling in DELETE, UPDATE, MERGE Key: SPARK-51191 URL: https://issues.apache.org/jira/browse/SPARK-51191 Project: Spark Issue Type: Test Components: SQL Affects Versions: 4.1 Reporter: Anton Okolnychyi -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51190) Fix TreeEnsembleModel.treeWeights
Ruifeng Zheng created SPARK-51190: - Summary: Fix TreeEnsembleModel.treeWeights Key: SPARK-51190 URL: https://issues.apache.org/jira/browse/SPARK-51190 Project: Spark Issue Type: Sub-task Components: Connect, ML Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51190) Fix TreeEnsembleModel.treeWeights
[ https://issues.apache.org/jira/browse/SPARK-51190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51190: - Assignee: Ruifeng Zheng > Fix TreeEnsembleModel.treeWeights > - > > Key: SPARK-51190 > URL: https://issues.apache.org/jira/browse/SPARK-51190 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-50812) Support pyspark.ml on Connect
[ https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926653#comment-17926653 ] Dongjoon Hyun commented on SPARK-50812: --- I added the label, `releasenotes`, not to forget to mention this effort. > Support pyspark.ml on Connect > - > > Key: SPARK-50812 > URL: https://issues.apache.org/jira/browse/SPARK-50812 > Project: Spark > Issue Type: Umbrella > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Bobby Wang >Priority: Major > Labels: releasenotes > Fix For: 4.0.0 > > > Starting from Apache Spark 3.4, Spark has supported Connect which introduced > a decoupled client-server architecture that allows remote connectivity to > Spark clusters using the DataFrame API and unresolved logical plans as the > protocol. The separation between client and server allows Spark and its open > ecosystem to be leveraged from everywhere. It can be embedded in modern data > applications, in IDEs, Notebooks and programming languages. > However, Spark Connect currently only supports Spark SQL, which means Spark > ML could not run the training/inference via Spark Connect. It will probably > result in losing some ML users. > So I would like to propose a way to support Spark ML on the Connect. Users > don't need to change their code to leverage connect to run Spark ML cases. > Here are some links, > Design doc: [Support spark.ml on > Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing] > > Draft PR: [https://github.com/wbo4958/spark/pull/5] > Example code, > {code:python} > spark = SparkSession.builder.remote("sc://localhost").getOrCreate() > df = spark.createDataFrame([ > (Vectors.dense([1.0, 2.0]), 1), > (Vectors.dense([2.0, -1.0]), 1), > (Vectors.dense([-3.0, -2.0]), 0), > (Vectors.dense([-1.0, -2.0]), 0), > ], schema=['features', 'label']) > lr = LogisticRegression() > lr.setMaxIter(30) > model: LogisticRegressionModel = lr.fit(df) > z = model.summary > x = model.predictRaw(Vectors.dense([1.0, 2.0])) > print(f"predictRaw {x}") > assert model.getMaxIter() == 30 > model.summary.roc.show() > print(model.summary.weightedRecall) > print(model.summary.recallByLabel) > print(model.coefficients) > print(model.intercept) > model.transform(df).show() > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-50812) Support pyspark.ml on Connect
[ https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-50812: -- Labels: releasenotes (was: pull-request-available) > Support pyspark.ml on Connect > - > > Key: SPARK-50812 > URL: https://issues.apache.org/jira/browse/SPARK-50812 > Project: Spark > Issue Type: Umbrella > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Bobby Wang >Priority: Major > Labels: releasenotes > Fix For: 4.0.0 > > > Starting from Apache Spark 3.4, Spark has supported Connect which introduced > a decoupled client-server architecture that allows remote connectivity to > Spark clusters using the DataFrame API and unresolved logical plans as the > protocol. The separation between client and server allows Spark and its open > ecosystem to be leveraged from everywhere. It can be embedded in modern data > applications, in IDEs, Notebooks and programming languages. > However, Spark Connect currently only supports Spark SQL, which means Spark > ML could not run the training/inference via Spark Connect. It will probably > result in losing some ML users. > So I would like to propose a way to support Spark ML on the Connect. Users > don't need to change their code to leverage connect to run Spark ML cases. > Here are some links, > Design doc: [Support spark.ml on > Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing] > > Draft PR: [https://github.com/wbo4958/spark/pull/5] > Example code, > {code:python} > spark = SparkSession.builder.remote("sc://localhost").getOrCreate() > df = spark.createDataFrame([ > (Vectors.dense([1.0, 2.0]), 1), > (Vectors.dense([2.0, -1.0]), 1), > (Vectors.dense([-3.0, -2.0]), 0), > (Vectors.dense([-1.0, -2.0]), 0), > ], schema=['features', 'label']) > lr = LogisticRegression() > lr.setMaxIter(30) > model: LogisticRegressionModel = lr.fit(df) > z = model.summary > x = model.predictRaw(Vectors.dense([1.0, 2.0])) > print(f"predictRaw {x}") > assert model.getMaxIter() == 30 > model.summary.roc.show() > print(model.summary.weightedRecall) > print(model.summary.recallByLabel) > print(model.coefficients) > print(model.intercept) > model.transform(df).show() > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51176) Meet consistency for unexpected errors PySpark Connect <> Classic
[ https://issues.apache.org/jira/browse/SPARK-51176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51176: --- Labels: pull-request-available (was: ) > Meet consistency for unexpected errors PySpark Connect <> Classic > - > > Key: SPARK-51176 > URL: https://issues.apache.org/jira/browse/SPARK-51176 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51197) unit test clean up
Ruifeng Zheng created SPARK-51197: - Summary: unit test clean up Key: SPARK-51197 URL: https://issues.apache.org/jira/browse/SPARK-51197 Project: Spark Issue Type: Sub-task Components: PySpark, Tests Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51197) unit test clean up
[ https://issues.apache.org/jira/browse/SPARK-51197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-51197. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 49927 [https://github.com/apache/spark/pull/49927] > unit test clean up > -- > > Key: SPARK-51197 > URL: https://issues.apache.org/jira/browse/SPARK-51197 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51197) unit test clean up
[ https://issues.apache.org/jira/browse/SPARK-51197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-51197: - Assignee: Ruifeng Zheng > unit test clean up > -- > > Key: SPARK-51197 > URL: https://issues.apache.org/jira/browse/SPARK-51197 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified
[ https://issues.apache.org/jira/browse/SPARK-51182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926679#comment-17926679 ] Vlad Rozov commented on SPARK-51182: This issue is a follow up on [https://github.com/apache/spark/pull/49654] and I have necessary changes implemented already and will open PR shortly. > DataFrameWriter should throw dataPathNotSpecifiedError when path is not > specified > - > > Key: SPARK-51182 > URL: https://issues.apache.org/jira/browse/SPARK-51182 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Vlad Rozov >Priority: Minor > Labels: pull-request-available > > When {{path}} is not specified in the call to > {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, > {{parquet(path)}} and etc, it will be more accurate to raise > {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51189) Promote JobFailed to DeveloperApi
[ https://issues.apache.org/jira/browse/SPARK-51189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51189: --- Labels: pull-request-available (was: ) > Promote JobFailed to DeveloperApi > - > > Key: SPARK-51189 > URL: https://issues.apache.org/jira/browse/SPARK-51189 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51059) Document how ALLOWED_ATTRIBUTES works
[ https://issues.apache.org/jira/browse/SPARK-51059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-51059. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 49918 [https://github.com/apache/spark/pull/49918] > Document how ALLOWED_ATTRIBUTES works > - > > Key: SPARK-51059 > URL: https://issues.apache.org/jira/browse/SPARK-51059 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-50812) Support pyspark.ml on Connect
[ https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-50812. --- Resolution: Resolved > Support pyspark.ml on Connect > - > > Key: SPARK-50812 > URL: https://issues.apache.org/jira/browse/SPARK-50812 > Project: Spark > Issue Type: Umbrella > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Bobby Wang >Priority: Major > Labels: releasenotes > Fix For: 4.0.0 > > > Starting from Apache Spark 3.4, Spark has supported Connect which introduced > a decoupled client-server architecture that allows remote connectivity to > Spark clusters using the DataFrame API and unresolved logical plans as the > protocol. The separation between client and server allows Spark and its open > ecosystem to be leveraged from everywhere. It can be embedded in modern data > applications, in IDEs, Notebooks and programming languages. > However, Spark Connect currently only supports Spark SQL, which means Spark > ML could not run the training/inference via Spark Connect. It will probably > result in losing some ML users. > So I would like to propose a way to support Spark ML on the Connect. Users > don't need to change their code to leverage connect to run Spark ML cases. > Here are some links, > Design doc: [Support spark.ml on > Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing] > > Draft PR: [https://github.com/wbo4958/spark/pull/5] > Example code, > {code:python} > spark = SparkSession.builder.remote("sc://localhost").getOrCreate() > df = spark.createDataFrame([ > (Vectors.dense([1.0, 2.0]), 1), > (Vectors.dense([2.0, -1.0]), 1), > (Vectors.dense([-3.0, -2.0]), 0), > (Vectors.dense([-1.0, -2.0]), 0), > ], schema=['features', 'label']) > lr = LogisticRegression() > lr.setMaxIter(30) > model: LogisticRegressionModel = lr.fit(df) > z = model.summary > x = model.predictRaw(Vectors.dense([1.0, 2.0])) > print(f"predictRaw {x}") > assert model.getMaxIter() == 30 > model.summary.roc.show() > print(model.summary.weightedRecall) > print(model.summary.recallByLabel) > print(model.coefficients) > print(model.intercept) > model.transform(df).show() > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51189) Promote JobFailed to DeveloperApi
[ https://issues.apache.org/jira/browse/SPARK-51189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51189: - Assignee: Cheng Pan > Promote JobFailed to DeveloperApi > - > > Key: SPARK-51189 > URL: https://issues.apache.org/jira/browse/SPARK-51189 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51189) Promote JobFailed to DeveloperApi
[ https://issues.apache.org/jira/browse/SPARK-51189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51189. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 49920 [https://github.com/apache/spark/pull/49920] > Promote JobFailed to DeveloperApi > - > > Key: SPARK-51189 > URL: https://issues.apache.org/jira/browse/SPARK-51189 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51193) Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final
[ https://issues.apache.org/jira/browse/SPARK-51193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51193. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 49923 [https://github.com/apache/spark/pull/49923] > Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final > - > > Key: SPARK-51193 > URL: https://issues.apache.org/jira/browse/SPARK-51193 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51197) unit test clean up
[ https://issues.apache.org/jira/browse/SPARK-51197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51197: --- Labels: pull-request-available (was: ) > unit test clean up > -- > > Key: SPARK-51197 > URL: https://issues.apache.org/jira/browse/SPARK-51197 > Project: Spark > Issue Type: Sub-task > Components: PySpark, Tests >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified
[ https://issues.apache.org/jira/browse/SPARK-51182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926681#comment-17926681 ] Wei Guo commented on SPARK-51182: - Okay, just follow up and I will close my PR [~vrozov] > DataFrameWriter should throw dataPathNotSpecifiedError when path is not > specified > - > > Key: SPARK-51182 > URL: https://issues.apache.org/jira/browse/SPARK-51182 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Vlad Rozov >Priority: Minor > Labels: pull-request-available > > When {{path}} is not specified in the call to > {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, > {{parquet(path)}} and etc, it will be more accurate to raise > {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51198) Revise `defaultMinPartitions` function description
[ https://issues.apache.org/jira/browse/SPARK-51198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51198: --- Labels: pull-request-available (was: ) > Revise `defaultMinPartitions` function description > -- > > Key: SPARK-51198 > URL: https://issues.apache.org/jira/browse/SPARK-51198 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.1.0 >Reporter: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51198) Revise `defaultMinPartitions` function description
Dongjoon Hyun created SPARK-51198: - Summary: Revise `defaultMinPartitions` function description Key: SPARK-51198 URL: https://issues.apache.org/jira/browse/SPARK-51198 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 4.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51199) Valid CSV records considered malformed
Andreas Franz created SPARK-51199: - Summary: Valid CSV records considered malformed Key: SPARK-51199 URL: https://issues.apache.org/jira/browse/SPARK-51199 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.5.4 Environment: SparkContext: Running Spark version 3.5.4 SparkContext: OS info Mac OS X, 15.3, aarch64 SparkContext: Java version 17.0.14 2025-01-21 LTS OpenJDK Runtime Environment Corretto-17.0.14.7.1 (build 17.0.14+7-LTS) OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (build 17.0.14+7-LTS, mixed mode, sharing) Reporter: Andreas Franz There is an issue parsing CSV files with a combination of escaped double quotes and commas in a field. I've created a small example that demonstrates the issue: {code:java} package com.example import org.apache.spark.sql.SparkSession object Example { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("CSV Example") .master("local[*]") .config("spark.driver.host", "localhost") .config("spark.ui.enabled", "false") .getOrCreate() val csv = spark .read .option("header", "true") .option("mode", "FAILFAST") .csv("./src/main/scala/com/example/example.csv") csv.show(2, truncate = false) spark.stop() } } {code} {code:java} id,region_name,gp_id,gp_name,gp_group_id,gp_group_name,gp_group_region_name 111234567,east,1122723,"Test 1",,, 001234567,east,1122723,"Foo ""Bar"", New York, US",,, {code} According to [https://www.ietf.org/rfc/rfc4180.txt|http://example.com/] this is a valid CSV record. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51188) Upgrade Arrow to 18.2.0
[ https://issues.apache.org/jira/browse/SPARK-51188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51188: --- Labels: pull-request-available (was: ) > Upgrade Arrow to 18.2.0 > --- > > Key: SPARK-51188 > URL: https://issues.apache.org/jira/browse/SPARK-51188 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.1.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51188) Upgrade Arrow to 18.2.0
Yang Jie created SPARK-51188: Summary: Upgrade Arrow to 18.2.0 Key: SPARK-51188 URL: https://issues.apache.org/jira/browse/SPARK-51188 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.1.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51059) Document how ALLOWED_ATTRIBUTES works
[ https://issues.apache.org/jira/browse/SPARK-51059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51059: --- Labels: pull-request-available (was: ) > Document how ALLOWED_ATTRIBUTES works > - > > Key: SPARK-51059 > URL: https://issues.apache.org/jira/browse/SPARK-51059 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51189) Promote JobFailed to DeveloperApi
Cheng Pan created SPARK-51189: - Summary: Promote JobFailed to DeveloperApi Key: SPARK-51189 URL: https://issues.apache.org/jira/browse/SPARK-51189 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51189) Promote JobFailed to DeveloperApi
[ https://issues.apache.org/jira/browse/SPARK-51189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51189: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Promote JobFailed to DeveloperApi > - > > Key: SPARK-51189 > URL: https://issues.apache.org/jira/browse/SPARK-51189 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51195) Upgrade `kubernetes-client` to 7.1.0
[ https://issues.apache.org/jira/browse/SPARK-51195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51195. --- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 49925 [https://github.com/apache/spark/pull/49925] > Upgrade `kubernetes-client` to 7.1.0 > > > Key: SPARK-51195 > URL: https://issues.apache.org/jira/browse/SPARK-51195 > Project: Spark > Issue Type: Sub-task > Components: Build, k8s >Affects Versions: 4.1.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51201) Make Partitioning Hints support byte and short values
Kent Yao created SPARK-51201: Summary: Make Partitioning Hints support byte and short values Key: SPARK-51201 URL: https://issues.apache.org/jira/browse/SPARK-51201 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51133) Upgrade `commons-pool2` to 2.12.1
[ https://issues.apache.org/jira/browse/SPARK-51133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51133: -- Summary: Upgrade `commons-pool2` to 2.12.1 (was: Upgrade Apache `commons-pool2` to 2.12.1) > Upgrade `commons-pool2` to 2.12.1 > - > > Key: SPARK-51133 > URL: https://issues.apache.org/jira/browse/SPARK-51133 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.1.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51194) Upgrade `scalafmt` to 3.8.6
[ https://issues.apache.org/jira/browse/SPARK-51194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51194. --- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 49924 [https://github.com/apache/spark/pull/49924] > Upgrade `scalafmt` to 3.8.6 > --- > > Key: SPARK-51194 > URL: https://issues.apache.org/jira/browse/SPARK-51194 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.1.0 >Reporter: Wei Guo >Assignee: Wei Guo >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51200) Add SparkR deprecation info to `README.md` and `make-distribution.sh` help
[ https://issues.apache.org/jira/browse/SPARK-51200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51200: - Assignee: Dongjoon Hyun > Add SparkR deprecation info to `README.md` and `make-distribution.sh` help > -- > > Key: SPARK-51200 > URL: https://issues.apache.org/jira/browse/SPARK-51200 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51193) Upgrade Netty to 4.1.118.Final
Dongjoon Hyun created SPARK-51193: - Summary: Upgrade Netty to 4.1.118.Final Key: SPARK-51193 URL: https://issues.apache.org/jira/browse/SPARK-51193 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51193) Upgrade Netty to 4.1.118.Final
[ https://issues.apache.org/jira/browse/SPARK-51193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51193: --- Labels: pull-request-available (was: ) > Upgrade Netty to 4.1.118.Final > -- > > Key: SPARK-51193 > URL: https://issues.apache.org/jira/browse/SPARK-51193 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51193) Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final
[ https://issues.apache.org/jira/browse/SPARK-51193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51193: -- Summary: Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final (was: Upgrade Netty to 4.1.118.Final) > Upgrade Netty to 4.1.118.Final and netty-tcnative to 2.0.70.Final > - > > Key: SPARK-51193 > URL: https://issues.apache.org/jira/browse/SPARK-51193 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51192) Expose a ResponseObserver-free version of `process` in SparkConnectPlanner
Venkata Sai Akhil Gudesa created SPARK-51192: Summary: Expose a ResponseObserver-free version of `process` in SparkConnectPlanner Key: SPARK-51192 URL: https://issues.apache.org/jira/browse/SPARK-51192 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Venkata Sai Akhil Gudesa [https://github.com/apache/spark/pull/47816] attempted to move `MockObserver` into source code to address compilation errors when open-source libraries attempt to test their command plugin extensions via the `SparkConnectPlannerUtils`. However, this isn't enough as the error `{*}java.lang.NoSuchMethodError: 'void org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(org.apache.spark.connect.proto.Command, io.grpc.stub.StreamObserver`{*} continues to be seen. To address this shading issue, we can move the creation of the `MockObserver` to the source code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51190) Fix TreeEnsembleModel.treeWeights
[ https://issues.apache.org/jira/browse/SPARK-51190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51190. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 49919 [https://github.com/apache/spark/pull/49919] > Fix TreeEnsembleModel.treeWeights > - > > Key: SPARK-51190 > URL: https://issues.apache.org/jira/browse/SPARK-51190 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51192) Expose a ResponseObserver-free version of `process` in SparkConnectPlanner
[ https://issues.apache.org/jira/browse/SPARK-51192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51192: --- Labels: pull-request-available (was: ) > Expose a ResponseObserver-free version of `process` in SparkConnectPlanner > -- > > Key: SPARK-51192 > URL: https://issues.apache.org/jira/browse/SPARK-51192 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Venkata Sai Akhil Gudesa >Priority: Major > Labels: pull-request-available > > [https://github.com/apache/spark/pull/47816] attempted to move `MockObserver` > into source code to address compilation errors when open-source libraries > attempt to test their command plugin extensions via the > `SparkConnectPlannerUtils`. > However, this isn't enough as the error `{*}java.lang.NoSuchMethodError: > 'void > org.apache.spark.sql.connect.planner.SparkConnectPlanner.process(org.apache.spark.connect.proto.Command, > io.grpc.stub.StreamObserver`{*} continues to be seen. > To address this shading issue, we can move the creation of the `MockObserver` > to the source code. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified
[ https://issues.apache.org/jira/browse/SPARK-51182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51182: --- Labels: pull-request-available (was: ) > DataFrameWriter should throw dataPathNotSpecifiedError when path is not > specified > - > > Key: SPARK-51182 > URL: https://issues.apache.org/jira/browse/SPARK-51182 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Vlad Rozov >Priority: Minor > Labels: pull-request-available > > When {{path}} is not specified in the call to > {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, > {{parquet(path)}} and etc, it will be more accurate to raise > {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified
[ https://issues.apache.org/jira/browse/SPARK-51182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926655#comment-17926655 ] Wei Guo commented on SPARK-51182: - I made a PR for this. > DataFrameWriter should throw dataPathNotSpecifiedError when path is not > specified > - > > Key: SPARK-51182 > URL: https://issues.apache.org/jira/browse/SPARK-51182 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.1.0 >Reporter: Vlad Rozov >Priority: Minor > Labels: pull-request-available > > When {{path}} is not specified in the call to > {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, > {{parquet(path)}} and etc, it will be more accurate to raise > {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51059) Document how ALLOWED_ATTRIBUTES works
[ https://issues.apache.org/jira/browse/SPARK-51059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-51059: - Assignee: Ruifeng Zheng > Document how ALLOWED_ATTRIBUTES works > - > > Key: SPARK-51059 > URL: https://issues.apache.org/jira/browse/SPARK-51059 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51198) Revise `defaultMinPartitions` function description
[ https://issues.apache.org/jira/browse/SPARK-51198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51198: - Assignee: Dongjoon Hyun > Revise `defaultMinPartitions` function description > -- > > Key: SPARK-51198 > URL: https://issues.apache.org/jira/browse/SPARK-51198 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 4.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51200) Add SparkR deprecation info to `README.md` and `make-distribution.sh` help
[ https://issues.apache.org/jira/browse/SPARK-51200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51200: --- Labels: pull-request-available (was: ) > Add SparkR deprecation info to `README.md` and `make-distribution.sh` help > -- > > Key: SPARK-51200 > URL: https://issues.apache.org/jira/browse/SPARK-51200 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51200) Add SparkR deprecation info to `README.md` and `make-distribution.sh` help
Dongjoon Hyun created SPARK-51200: - Summary: Add SparkR deprecation info to `README.md` and `make-distribution.sh` help Key: SPARK-51200 URL: https://issues.apache.org/jira/browse/SPARK-51200 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51190) Fix TreeEnsembleModel.treeWeights
[ https://issues.apache.org/jira/browse/SPARK-51190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51190: --- Labels: pull-request-available (was: ) > Fix TreeEnsembleModel.treeWeights > - > > Key: SPARK-51190 > URL: https://issues.apache.org/jira/browse/SPARK-51190 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-51188) Upgrade Arrow to 18.2.0
[ https://issues.apache.org/jira/browse/SPARK-51188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-51188. --- Fix Version/s: 4.1.0 Resolution: Fixed Issue resolved by pull request 49904 [https://github.com/apache/spark/pull/49904] > Upgrade Arrow to 18.2.0 > --- > > Key: SPARK-51188 > URL: https://issues.apache.org/jira/browse/SPARK-51188 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51188) Upgrade Arrow to 18.2.0
[ https://issues.apache.org/jira/browse/SPARK-51188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51188: - Assignee: Yang Jie > Upgrade Arrow to 18.2.0 > --- > > Key: SPARK-51188 > URL: https://issues.apache.org/jira/browse/SPARK-51188 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51188) Upgrade Arrow to 18.2.0
[ https://issues.apache.org/jira/browse/SPARK-51188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51188: -- Parent: SPARK-51166 Issue Type: Sub-task (was: Improvement) > Upgrade Arrow to 18.2.0 > --- > > Key: SPARK-51188 > URL: https://issues.apache.org/jira/browse/SPARK-51188 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.1.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-50812) Support pyspark.ml on Connect
[ https://issues.apache.org/jira/browse/SPARK-50812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926662#comment-17926662 ] Ruifeng Zheng commented on SPARK-50812: --- Thank you [~dongjoon] ! > Support pyspark.ml on Connect > - > > Key: SPARK-50812 > URL: https://issues.apache.org/jira/browse/SPARK-50812 > Project: Spark > Issue Type: Umbrella > Components: Connect, ML, PySpark >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Bobby Wang >Priority: Major > Labels: releasenotes > Fix For: 4.0.0 > > > Starting from Apache Spark 3.4, Spark has supported Connect which introduced > a decoupled client-server architecture that allows remote connectivity to > Spark clusters using the DataFrame API and unresolved logical plans as the > protocol. The separation between client and server allows Spark and its open > ecosystem to be leveraged from everywhere. It can be embedded in modern data > applications, in IDEs, Notebooks and programming languages. > However, Spark Connect currently only supports Spark SQL, which means Spark > ML could not run the training/inference via Spark Connect. It will probably > result in losing some ML users. > So I would like to propose a way to support Spark ML on the Connect. Users > don't need to change their code to leverage connect to run Spark ML cases. > Here are some links, > Design doc: [Support spark.ml on > Connect|https://docs.google.com/document/d/1EUvSZuI-so83cxb_fTVMoz0vUfAaFmqXt39yoHI-D9I/edit?usp=sharing] > > Draft PR: [https://github.com/wbo4958/spark/pull/5] > Example code, > {code:python} > spark = SparkSession.builder.remote("sc://localhost").getOrCreate() > df = spark.createDataFrame([ > (Vectors.dense([1.0, 2.0]), 1), > (Vectors.dense([2.0, -1.0]), 1), > (Vectors.dense([-3.0, -2.0]), 0), > (Vectors.dense([-1.0, -2.0]), 0), > ], schema=['features', 'label']) > lr = LogisticRegression() > lr.setMaxIter(30) > model: LogisticRegressionModel = lr.fit(df) > z = model.summary > x = model.predictRaw(Vectors.dense([1.0, 2.0])) > print(f"predictRaw {x}") > assert model.getMaxIter() == 30 > model.summary.roc.show() > print(model.summary.weightedRecall) > print(model.summary.recallByLabel) > print(model.coefficients) > print(model.intercept) > model.transform(df).show() > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51195) Upgrade `kubernetes-client` to 7.1.0
[ https://issues.apache.org/jira/browse/SPARK-51195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51195: --- Labels: pull-request-available (was: ) > Upgrade `kubernetes-client` to 7.1.0 > > > Key: SPARK-51195 > URL: https://issues.apache.org/jira/browse/SPARK-51195 > Project: Spark > Issue Type: Improvement > Components: Build, k8s >Affects Versions: 4.1.0 >Reporter: Wei Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51195) Upgrade `kubernetes-client` to 7.1.0
Wei Guo created SPARK-51195: --- Summary: Upgrade `kubernetes-client` to 7.1.0 Key: SPARK-51195 URL: https://issues.apache.org/jira/browse/SPARK-51195 Project: Spark Issue Type: Improvement Components: Build, k8s Affects Versions: 4.1.0 Reporter: Wei Guo -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51195) Upgrade `kubernetes-client` to 7.1.0
[ https://issues.apache.org/jira/browse/SPARK-51195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51195: -- Parent: SPARK-51166 Issue Type: Sub-task (was: Improvement) > Upgrade `kubernetes-client` to 7.1.0 > > > Key: SPARK-51195 > URL: https://issues.apache.org/jira/browse/SPARK-51195 > Project: Spark > Issue Type: Sub-task > Components: Build, k8s >Affects Versions: 4.1.0 >Reporter: Wei Guo >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51196) Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2047` and `_LEGACY_ERROR_TEMP_2050`
Wei Guo created SPARK-51196: --- Summary: Assign appropriate error condition for `_LEGACY_ERROR_TEMP_2047` and `_LEGACY_ERROR_TEMP_2050` Key: SPARK-51196 URL: https://issues.apache.org/jira/browse/SPARK-51196 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.1.0 Reporter: Wei Guo -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48163) Fix Flaky Test: `SparkConnectServiceSuite.SPARK-43923: commands send events - get_resources_command`
[ https://issues.apache.org/jira/browse/SPARK-48163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-48163: -- Parent Issue: SPARK-51166 (was: SPARK-44111) > Fix Flaky Test: `SparkConnectServiceSuite.SPARK-43923: commands send events - > get_resources_command` > > > Key: SPARK-48163 > URL: https://issues.apache.org/jira/browse/SPARK-48163 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > > This is a long standing flakiness from early 2024 to now. > - https://github.com/apache/spark/actions/runs/12882534288/job/35914995457 > (2025-01-21) > {code} > - SPARK-43923: commands send events ((get_resources_command { > [info] } > [info] ,None)) *** FAILED *** (35 milliseconds) > [info] VerifyEvents.this.listener.executeHolder.isDefined was false > (SparkConnectServiceSuite.scala:873) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51183) Update spec to point to Parquet
David Cashman created SPARK-51183: - Summary: Update spec to point to Parquet Key: SPARK-51183 URL: https://issues.apache.org/jira/browse/SPARK-51183 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.1 Reporter: David Cashman The shredding spec has moved to Parquet, and the version in Spark is out of date relative to the code. We should update to point to the Parquet spec. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51184) Remove `TaskState.LOST` logic from `TaskSchedulerImpl`
Dongjoon Hyun created SPARK-51184: - Summary: Remove `TaskState.LOST` logic from `TaskSchedulerImpl` Key: SPARK-51184 URL: https://issues.apache.org/jira/browse/SPARK-51184 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.1.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51184) Remove `TaskState.LOST` logic from `TaskSchedulerImpl`
[ https://issues.apache.org/jira/browse/SPARK-51184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51184: --- Labels: pull-request-available (was: ) > Remove `TaskState.LOST` logic from `TaskSchedulerImpl` > -- > > Key: SPARK-51184 > URL: https://issues.apache.org/jira/browse/SPARK-51184 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-51184) Remove `TaskState.LOST` logic from `TaskSchedulerImpl`
[ https://issues.apache.org/jira/browse/SPARK-51184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-51184: - Assignee: Dongjoon Hyun > Remove `TaskState.LOST` logic from `TaskSchedulerImpl` > -- > > Key: SPARK-51184 > URL: https://issues.apache.org/jira/browse/SPARK-51184 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51180) Upgrade `Arrow` to 19.0.0
[ https://issues.apache.org/jira/browse/SPARK-51180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51180: -- Parent: SPARK-51166 Issue Type: Sub-task (was: Improvement) > Upgrade `Arrow` to 19.0.0 > - > > Key: SPARK-51180 > URL: https://issues.apache.org/jira/browse/SPARK-51180 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.1.0 >Reporter: Aimilios Tsouvelekakis >Priority: Major > Labels: pull-request-available > > Current v4.0.0 planning has arrow until 18.0.0, it would be good to move it > to version 19.0.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-51182) DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified
Vlad Rozov created SPARK-51182: -- Summary: DataFrameWriter should throw dataPathNotSpecifiedError when path is not specified Key: SPARK-51182 URL: https://issues.apache.org/jira/browse/SPARK-51182 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.1.0 Reporter: Vlad Rozov When {{path}} is not specified in the call to {{DataFrame.write().save(path)}} explicitly or using {{option(path, ...)}}, {{parquet(path)}} and etc, it will be more accurate to raise {{dataPathNotSpecifiedError}} instead of {{multiplePathsSpecifiedError}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51183) Update spec to point to Parquet
[ https://issues.apache.org/jira/browse/SPARK-51183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-51183: --- Labels: pull-request-available (was: ) > Update spec to point to Parquet > --- > > Key: SPARK-51183 > URL: https://issues.apache.org/jira/browse/SPARK-51183 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.1 >Reporter: David Cashman >Priority: Major > Labels: pull-request-available > > The shredding spec has moved to Parquet, and the version in Spark is out of > date relative to the code. We should update to point to the Parquet spec. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51008) Implement Result Stage for AQE
[ https://issues.apache.org/jira/browse/SPARK-51008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51008: -- Parent: SPARK-44111 Issue Type: Sub-task (was: Improvement) > Implement Result Stage for AQE > -- > > Key: SPARK-51008 > URL: https://issues.apache.org/jira/browse/SPARK-51008 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Ziqi Liu >Assignee: Ziqi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > To support > [https://github.com/apache/spark/pull/44013#issuecomment-2421167393] we need > to implement Result Stage for AQE so that all plan segment can fall into a > stage context. This would also improve the AQE flow to a more self-contained > state. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-51042) Read and write CalendarIntervals using one call to get/putLong consistently
[ https://issues.apache.org/jira/browse/SPARK-51042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-51042: -- Fix Version/s: 3.5.5 > Read and write CalendarIntervals using one call to get/putLong consistently > --- > > Key: SPARK-51042 > URL: https://issues.apache.org/jira/browse/SPARK-51042 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0, 3.5.4, 3.5.5, 4.1.0 >Reporter: Jonathan Albrecht >Assignee: Jonathan Albrecht >Priority: Minor > Labels: big-endian, pull-request-available > Fix For: 4.0.0, 3.5.5 > > > In commit ac07cea234f4fb687442aafa8b6d411695110a4e there was a performance > improvement to reading a writing CalendarIntervals in UnsafeRow. This same > change can be applied to UnsafeArrayData and UnsafeWriter. > This would also fix big endian platforms where the current and proposed > methods of reading and writing CalendarIntervals do not order the bytes in > the same way. Currently CalendarInterval related tests in Catalyst and SQL > are failing on big endian platforms. > There would be no effect on little endian platforms (byte order is not > affected) except for performance improvement. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org