Re: Spark Thrift Server - Not Scaling Down Executors 3.4.2+

2024-09-05 Thread Cheng Pan
The default value of spark.dynamicAllocation.shuffleTracking.enabled was changed from false to true in Spark 3.4.0, disabling it might help. [1] https://spark.apache.org/docs/latest/core-migration-guide.html#upgrading-from-core-33-to-34 Thanks, Cheng Pan > On Sep 6, 2024, at 00

Re: Hitting SPARK-45858 on Kubernetes - Unavoidable bug or misconfiguration?

2024-08-20 Thread Cheng Pan
, Apache Celeborn [3], a Remote Shuffle Service for Spark. Thanks, Cheng Pan [1] https://spark.apache.org/docs/latest/running-on-kubernetes.html#local-storage [2] https://github.com/apache/spark/blob/v3.5.2/core/src/main/java/org/apache/spark/shuffle/api/ShuffleDriverComponents.java#L65-L72 [3

Re: Hitting SPARK-45858 on Kubernetes - Unavoidable bug or misconfiguration?

2024-08-20 Thread Cheng Pan
org.apache.spark.shuffle.KubernetesLocalDiskShuffleDataIO does NOT support reliable storage, so the condition 4) is false even with this configuration. I’m not sure why you think it does. Thanks, Cheng Pan > On Aug 20, 2024, at 18:27, Aaron Grubb wrote: > &g

[ANNOUNCE] Apache Kyuubi released 1.9.1

2024-06-02 Thread Cheng Pan
Hi all, The Apache Kyuubi community is pleased to announce that Apache Kyuubi 1.9.1 has been released! This release brings support for Apache Spark 4.0.0-preview1. Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provide

Re: External Spark shuffle service for k8s

2024-04-07 Thread Cheng Pan
-samples/emr-remote-shuffle-service [4] https://github.com/apache/celeborn/issues/2140 Thanks, Cheng Pan > On Apr 6, 2024, at 21:41, Mich Talebzadeh wrote: > > I have seen some older references for shuffle service for k8s, > although it is not clear they are talking about a generic shuff

[DISCUSS] MySQL version support policy

2024-03-24 Thread Cheng Pan
-innovation-and-long-term-support-lts-versions/ [3] https://github.com/apache/spark/pull/45581 [4] https://aws.amazon.com/rds/mysql/ [5] https://learn.microsoft.com/en-us/azure/mysql/concepts-version-policy Thanks, Cheng Pan - To

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
Okay, Let me double-check it carefully. Thank you very much for your help! 发件人: Jungtaek Lim 发送时间: 2024年3月5日 21:56:41 收件人: Pan,Bingkun 抄送: Dongjoon Hyun; dev; user 主题: Re: [ANNOUNCE] Apache Spark 3.5.1 released Yeah the approach seems OK to me - please double

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
:07 收件人: Pan,Bingkun 抄送: Dongjoon Hyun; dev; user 主题: Re: [ANNOUNCE] Apache Spark 3.5.1 released Let me be more specific. We have two active release version lines, 3.4.x and 3.5.x. We just released Spark 3.5.1, having a dropdown as 3.5.1 and 3.4.2 given the fact the last version of 3.4.x is 3.4.2

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
time of each new document release. Of course, if we need to keep the latest in every document, I think it's also possible. Only by sharing the same version. json file in each version. 发件人: Jungtaek Lim 发送时间: 2024年3月5日 16:47:30 收件人: Pan,Bingkun 抄送: Dongjoon

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
According to my understanding, the original intention of this feature is that when a user has entered the pyspark document, if he finds that the version he is currently in is not the version he wants, he can easily jump to the version he wants by clicking on the drop-down box. Additionally, in t

[ANNOUNCE] Apache Kyuubi 1.8.1 is available

2024-02-20 Thread Cheng Pan
to thank all contributors of the Kyuubi community who made this release possible! Thanks, Cheng Pan, on behalf of Apache Kyuubi community - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Spark on Kubernetes]: Seeking Guidance on Handling Persistent Executor Failures

2024-02-19 Thread Cheng Pan
Spark has supported the window-based executor failure-tracking mechanism for YARN for a long time, SPARK-41210[1][2] (included in 3.5.0) extended this feature to K8s. [1] https://issues.apache.org/jira/browse/SPARK-41210 [2] https://github.com/apache/spark/pull/38732 Thanks, Cheng Pan >

[ANNOUNCE] Apache Kyuubi released 1.8.0

2023-11-06 Thread Cheng Pan
Hi all, The Apache Kyuubi community is pleased to announce that Apache Kyuubi 1.8.0 has been released! Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses. Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface for en

[ANNOUNCE] Apache Celeborn(incubating) 0.3.1 available

2023-10-13 Thread Cheng Pan
: https://celeborn.apache.org/ Celeborn Resources: - Issue Management: https://issues.apache.org/jira/projects/CELEBORN - Mailing List: d...@celeborn.apache.org Thanks, Cheng Pan On behalf of the Apache Celeborn(incubating) community

Re: Spark Vulnerabilities

2023-08-14 Thread Cheng Pan
For the Guava case, you may be interested in https://github.com/apache/spark/pull/42493 Thanks, Cheng Pan > On Aug 14, 2023, at 16:50, Sankavi Nagalingam > wrote: > > Hi Team, > We could see there are many dependent vulnerabilities present in the latest > spark-core:3.4.

Re: Spark Multiple Hive Metastore Catalog Support

2023-04-17 Thread Cheng Pan
] https://github.com/apache/kyuubi/tree/master/extensions/spark/kyuubi-spark-connector-hive Thanks, Cheng Pan On Apr 18, 2023 at 00:38:23, Elliot West wrote: > Hi Ankit, > > While not a part of Spark, there is a project called 'WaggleDance' that > can federate multiple Hive m

Re: spark on k8s daemonset collect log

2023-03-14 Thread Cheng Pan
://github.com/apache/spark/pull/38357 Thanks, Cheng Pan On Mar 14, 2023 at 16:36:45, 404 wrote: > hi, all > > Spark runs on k8s, uses daemonset filebeat to collect logs, and writes > them to elasticsearch. The docker logs are in json format, and each line is > a json string. How to m

[ANNOUNCE] Apache Kyuubi released 1.7.0

2023-03-07 Thread Cheng Pan
...@kyuubi.apache.org We would like to thank all contributors of the Kyuubi community who made this release possible! Thanks, Cheng Pan, on behalf of Apache Kyuubi community

Re: The Dataset unit test is much slower than the RDD unit test (in Scala)

2022-11-01 Thread Cheng Pan
://issues.apache.org/jira/browse/SPARK-38138 Thanks, Cheng Pan On Nov 2, 2022 at 00:14:34, Enrico Minack wrote: > Hi Tanin, > > running your test with option "spark.sql.planChangeLog.level" set to > "info" or "warn" (depending on your Spark log level) will sh

Re: Writing Custom Spark Readers and Writers

2022-04-06 Thread Cheng Pan
There are some projects based on Spark DataSource V2 that I hope will help you. https://github.com/datastax/spark-cassandra-connector https://github.com/housepower/spark-clickhouse-connector https://github.com/oracle/spark-oracle https://github.com/pingcap/tispark Thanks, Cheng Pan On Wed, Apr

Re: spark as data warehouse?

2022-03-26 Thread Cheng Pan
cs/latest/deployment/engine_share_level.html [2] https://github.com/apache/incubator-kyuubi/discussions/925 Thanks, Cheng Pan --- Thanks, I'll check it out. I have a use case where we want to use dbt as data middling tool . Will it take dbt queries and create the resulting model ? I see it

[ANNOUNCE] Release Apache Kyuubi(Incubating) 1.3.0-incubating

2021-09-26 Thread Cheng Pan
Hello Spark Community, The Apache Kyuubi(Incubating) community is pleased to announce that Apache Kyuubi(Incubating) 1.3.0-incubating has been released! Apache Kyuubi(Incubating) is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark

does it support by the submission request with client deploy mode to master rest port

2016-10-13 Thread Marc Pan
hi there, I'm new bee for Spark, recently beginning my learning journey come with spark 2.0.1. I hit an issue maybe totally simple. When trying to run SparkPi example in Scala in following command, an exception was thrown. Is it right behavior or something wrong in my command? # bin/spark-submit

Cost of converting RDD's to dataframe and back

2016-06-23 Thread pan
Hello, I am trying to understand the cost of converting an RDD to Dataframe and back. Would a conversion back and forth very frequently cost performance. I do observe that some operations like join are implemented very differently for RDD (pair) and Dataframe so trying to figure out the cose of