Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-05 Thread Martin Grund
Not sure this counts as -1, but by cursory checking the code, I found that the way the TLS connection is set up is not always working: https://github.com/apache/spark-connect-swift/blob/v0.1.0-rc1/Sources/SparkConnect/DataFrame.swift#L276-L288 Shows that DataFrame operations explicitly set

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-05 Thread Gabor Somogyi
+1 (non-binding) G On Mon, May 5, 2025 at 8:35 AM huaxin gao wrote: > +1 Thanks Dongjoon. > > On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun wrote: > >> +1 >> >> I checked the checksum and signatures, and tested with Apache Spark 4.0.0 >> RC4 on Swift 6.

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread Gabor Somogyi
un wrote: >> > >> > >> > >> > +1 >> > >> > I checked the checksum and signatures, and tested with K8s v1.32. >> > >> > Dongjoon. >> > >> > On 2025/05/04 23:58:54 Zhou Jiang wrote: >> >> +1 , thanks

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread kazuyuki tanimura
4 23:58:54 Zhou Jiang wrote: >> >> +1 , thanks for driving this release! >> >> >> >> *Zhou JIANG* >> >> >> >> >> >> >> >> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun > >> <mailto:dongjoon.h...@gma

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-05 Thread kazuyuki tanimura
+1 (non-binding) Kazu > On May 4, 2025, at 11:31 PM, huaxin gao wrote: > > +1 Thanks Dongjoon. > > On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun <mailto:dongj...@apache.org>> wrote: >> +1 >> >> I checked the checksum and signatures, and tested wi

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread huaxin gao
gt; Dongjoon. > > > > On 2025/05/04 23:58:54 Zhou Jiang wrote: > >> +1 , thanks for driving this release! > >> > >> *Zhou JIANG* > >> > >> > >> > >> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun > wrote: > >> > >>

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-04 Thread huaxin gao
+1 Thanks Dongjoon. On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun wrote: > +1 > > I checked the checksum and signatures, and tested with Apache Spark 4.0.0 > RC4 on Swift 6.1. > > This is the initial release (v0.1) with 105 patches to provide a tangible > release to the use

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-04 Thread L. C. Hsieh
+1 On Sun, May 4, 2025 at 3:15 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark Connect > Swift Client 0.1.0. This vote is open for the next 72 hours and passes if a > majority +1 PMC votes are cast, with a minimum of 3 +1 votes. > &

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread L. C. Hsieh
+1 On Sun, May 4, 2025 at 4:58 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark K8s Operator > 0.1.0. This vote is open for the next 72 hours and passes if a majority +1 > PMC votes are cast, with a minimum of 3 +1 votes. > > [

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread Rozov, Vlad
> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun wrote: >> >>> Please vote on releasing the following candidate as Apache Spark K8s >>> Operator 0.1.0. This vote is open for the next 72 hours and passes if a >>> majority +1 PMC votes are cast, with a minimum of 3 +1 v

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread Dongjoon Hyun
easing the following candidate as Apache Spark K8s > > Operator 0.1.0. This vote is open for the next 72 hours and passes if a > > majority +1 PMC votes are cast, with a minimum of 3 +1 votes. > > > > [ ] +1 Release this package as Apache Spark K8s Operator 0.1.0 > >

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-04 Thread Dongjoon Hyun
+1 I checked the checksum and signatures, and tested with Apache Spark 4.0.0 RC4 on Swift 6.1. This is the initial release (v0.1) with 105 patches to provide a tangible release to the users. v0.2 is under planning in SPARK-51999. Dongjoon. On 2025/05/04 22:14:54 Dongjoon Hyun wrote

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread Zhou Jiang
+1 , thanks for driving this release! *Zhou JIANG* On Sun, May 4, 2025 at 16:58 Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark K8s > Operator 0.1.0. This vote is open for the next 72 hours and passes if a > majority +1 PMC votes are cas

[VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark K8s Operator 0.1.0. This vote is open for the next 72 hours and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark K8s Operator 0.1.0 [ ] -1 Do not release this

[VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-04 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark Connect Swift Client 0.1.0. This vote is open for the next 72 hours and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark Connect Swift Client 0.1.0 [ ] -1 Do not

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Cheng Pan
Does the following options works for you? ./bin/spark-shell --conf spark.jars.ivy=${HOME}/.ivy2 ./bin/spark-shell --conf spark.jars.ivy=/Users/yourname/.ivy2 I think the issue is that ~ is not interpreted by shell and just passthrough to the Ivy lib. Thanks, Cheng Pan > On Apr 29, 2025,

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Wenchen Fan
Hi Jacek, Thanks for the confirmation! Let's change the wording first, and open a JIRA ticket for the relative path support. Wenchen On Tue, Apr 29, 2025 at 2:41 AM Jacek Laskowski wrote: > Hi Wenchen, > > Looks like it didn't work in 3.5 either. > > ❯ ./bin/spark-s

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Jacek Laskowski
Hi Wenchen, Looks like it didn't work in 3.5 either. ❯ ./bin/spark-shell --version 25/04/28 20:37:48 WARN Utils: Your hostname, Jaceks-Mac-mini.local resolves to a loopback address: 127.0.0.1; using 192.168.68.100 instead (on interface en1) 25/04/28 20:37:48 WARN Utils: Set SPARK_LOCAL_IP i

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-27 Thread Wenchen Fan
Hi Jacek, Thanks for reporting the issue! Did you hit the same problem when you set the `spark.jars.ivy` config with Spark 3.5? If this config never worked with a relative path, we should change the wording in the migration guide. Thanks, Wenchen On Sun, Apr 27, 2025 at 10:27 PM Jacek Laskowski

Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-27 Thread Jacek Laskowski
Hi, I found in docs/core-migration-guide.md: - Since Spark 4.0, Spark uses `~/.ivy2.5.2` as Ivy user directory by default to isolate the existing systems from Apache Ivy's incompatibility. To restore the legacy behavior, you can set `spark.jars.ivy` to `~/.ivy2`. With that, I

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-23 Thread Szehon Ho
One more small fix (on another topic) for the next RC: https://github.com/apache/spark/pull/50685 Thanks! Szehon On Tue, Apr 22, 2025 at 10:07 AM Rozov, Vlad wrote: > Correct, to me it looks like a Spark bug > https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to > tr

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-22 Thread Rozov, Vlad
Correct, to me it looks like a Spark bug https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to trigger and is reproduce using the test case provided in https://github.com/apache/spark/pull/50594: 1. Spark UninterruptibleThread “task” is interrupted by “test” thread while “task

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-22 Thread Wenchen Fan
Correct me if I'm wrong: this is a long-standing Spark bug that is very hard to trigger, but the new Parquet version happens to hit the trigger condition and exposes the bug. If this is the case, I'm +1 to fix the Spark bug instead of downgrading the Parquet version. Let's mov

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Manu Zhang
I don't think PARQUET-2432 has any issue itself. It looks to have triggered a deadlock case like https://github.com/apache/spark/pull/50594. I'd suggest that we fix forward if possible. Thanks, Manu On Mon, Apr 21, 2025 at 11:19 PM Rozov, Vlad wrote: > The deadlock is reprodu

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Rozov, Vlad
The deadlock is reproducible without Parquet. Please see https://github.com/apache/spark/pull/50594. Thank you, Vlad On Apr 21, 2025, at 1:59 AM, Cheng Pan wrote: The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the latest workable version is Parquet 1.13.1

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Cheng Pan
The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the latest workable version is Parquet 1.13.1. Thanks, Cheng Pan > On Apr 21, 2025, at 16:53, Wenchen Fan wrote: > > +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to > https://github.com/

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Wenchen Fan
+1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to https://github.com/apache/spark/pull/50583#issuecomment-2815243571 , the Parquet CVE does not affect Spark. On Mon, Apr 21, 2025 at 2:45 PM Hyukjin Kwon wrote: > That's nice but we need to wait for them to release, and upgra

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-20 Thread Yuming Wang
It seems this patch(https://github.com/apache/parquet-java/pull/3196) can avoid deadlock issue if using Parquet 1.15.1. On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar wrote: > I found another bug introduced in 4.0 that breaks Spark connect client x > server compatibility: https://gith

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-20 Thread Hyukjin Kwon
uet-java/pull/3196) can > avoid deadlock issue if using Parquet 1.15.1. > > On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar > wrote: > >> I found another bug introduced in 4.0 that breaks Spark connect client x >> server compatibility: https://github.com/apache/spark/

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-16 Thread Niranjan Jayakar
I found another bug introduced in 4.0 that breaks Spark connect client x server compatibility: https://github.com/apache/spark/pull/50604. Once merged, this should be included in the next RC. On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan wrote: > Please vote on releasing the following candid

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-15 Thread Rozov, Vlad
It may not be the Parquet introduced issue. It looks like a race condition between Spark UninterruptibleThread and Hadoop/HDFS DFSOutputStream. I tried to resolve the deadlock in https://github.com/apache/spark/pull/50594. Can you give it a try? I will see if I can reproduce the deadlock in a

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-15 Thread Yuming Wang
ava.base@17.0.6/Thread.java:833) Found 1 deadlock. On Mon, Apr 14, 2025 at 11:13 AM Hyukjin Kwon wrote: > Made a fix at https://github.com/apache/spark/pull/50575 👍 > > On Mon, 14 Apr 2025 at 11:42, Wenchen Fan wrote: > >> I'm testing the new spark-connect distribution

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-14 Thread Yuming Wang
s working on, or are you still > investigating it? If the issue is confirmed by the Parquet community, we > can probably roll back to the previous Parquet version for Spark 4.0. > > Thanks, > Wenchen > > On Tue, Apr 15, 2025 at 7:24 AM Yuming Wang wrote: > >> This rel

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-14 Thread Wenchen Fan
Hi Yuming, 1.51.1 is the latest release of Apache Parquet for the 1.x line. Is it a known issue the Parquet community is working on, or are you still investigating it? If the issue is confirmed by the Parquet community, we can probably roll back to the previous Parquet version for Spark 4.0

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-13 Thread Hyukjin Kwon
Made a fix at https://github.com/apache/spark/pull/50575 👍 On Mon, 14 Apr 2025 at 11:42, Wenchen Fan wrote: > I'm testing the new spark-connect distribution and here is the result: > > 4 packages are tested: pip install pyspark, pip install pyspark_connect (I > installed

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-13 Thread Wenchen Fan
I'm testing the new spark-connect distribution and here is the result: 4 packages are tested: pip install pyspark, pip install pyspark_connect (I installed them with the RC4 pyspark tarballs), the classic tarball (spark-4.0.0-bin-hadoop3.tgz), the connect tarball (spark-4.0.0-bin-hadoop3-

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
{ "emoji": "👍", "version": 1 }

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
performance needs to be checked.With YARN and External Spark Shuffle, the sparkshuffle is a lot more optimized, so we can experience slowness with spark on k8s, especially if there is a pod restart. Have you checked Apache Uniffle / Celeborn for enabling spark shuffle ? fyi .. i'm

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
Pls check if there are resource constraints on the pods/nodes especially if they are shared. MinIO connectivity performance needs to be checked. With YARN and External Spark Shuffle, the sparkshuffle is a lot more optimized, so we can experience slowness with spark on k8s, especially if there is

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Karan,I am using Spark open source in kubernetes and Spark mapr bundle in YARN.For launching job in both approach it takes same 10 secs .For shuffle I am using local in both yarn and kubernetes.Sent from my iPhoneOn Apr 11, 2025, at 11:24 AM, karan alang wrote:Hi Prem,Which distribution of

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
Hi Prem, Which distribution of Spark are you using ? how long does it take to launch the job ? wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ? regds, Karan On Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo wrote: > Hello Team, > I

SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Team, I have a peculiar case of Spark slowness. I am using Minio as Object storage from where Spark reads & writes data. I am using YARN as Master and executing a Spark job which takes ~5mins the same job when run with Kubernetes as Master it takes ~8 mins . I checked the Spark DAG in

[VOTE] Release Spark 4.0.0 (RC4)

2025-04-10 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until April 15 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-04-05 Thread Ángel Álvarez Pascua
this proposal now ... 😂 *"If you haven’t encountered this kind of ‘dependency hell’ while working on geospatial projects with Spark, you may have been fortunate to deal with relatively simple cases."* Yes, that was the case for us. We loaded OpenStreetMap data from Spain, calculated some Have

Re: Spark build failed> File line length exceeds 100 characters

2025-04-05 Thread Ángel Álvarez Pascua
I've noticed that the check is set in *scalastyle-config.xml*: true Given this configuration, how is it possible that some people have been able to commit changes violating this rule? Moreover, how were these changes even merged despite failing this validation? It seems like

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-04-04 Thread Rozov, Vlad
you, Vlad On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon wrote: That only fixes Maven. Both SBT build and Maven build should work in the same or similar wat. Let's make sure both work. On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad wrote: Please see https://github.com/vrozov/spark/tree/spark-she

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-04-04 Thread Steve Loughran
#Options_to_Tune But you need that underlying hosting infra to be the same before making comparisons about the layers above. Why not start by either replicating your previous setup in k8s or running spark 3.5 standalone outside k8s and comparing it to spark 3.2 in the same environment? On Tue, 25 Mar 2025 at

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-31 Thread huaxin gao
Hi Wenchen, Could you please wait for https://github.com/apache/spark/pull/50246 to be merged before you cut the next RC? Thanks, Huaxin On Mon, Mar 31, 2025 at 8:53 PM Wenchen Fan wrote: > Hi all, > > Thanks for your feedback! Regarding > https://github.com/apache/spark/pull/501

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-31 Thread Wenchen Fan
Hi all, Thanks for your feedback! Regarding https://github.com/apache/spark/pull/50187 , I don't think it's a 4.0 blocker as it's a CI issue for the examples module. Other than that, all other issues have been resolved and I'll cut the next RC after https://github.com/apache

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-31 Thread Wenchen Fan
believe it’s important to standardize common data types in Spark and clearly define the boundaries between different layers in the Lakehouse ecosystem. While it makes sense for Apache Sedona to have its own Parquet data source for geospatial types in the absence of a standard, the long-term goal

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-31 Thread Steve Loughran
. I'd be curious about what those numbers are -though they only measure task/job commit, not all the work (that's not quite true, but...) You can get a log of all S3 IO performed for an entire Spark job across all worker threads, via the S3 auditing, https://hadoop.apache.org/docs/stable/

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-30 Thread Szehon Ho
again for the expertise from Sedona side in these efforts. Thanks! Szehon Sent from my iPhone > On Mar 29, 2025, at 11:42 PM, Jia Yu wrote: > > Hi Reynold and team, > > I’m glad to see that the Spark community is recognizing the importance > of geospatial support. The Se

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-30 Thread Jia Yu
Hey Angel, I am glad that you asked these questions. Please see my answers below. *1. Domain types evolve quickly. - It has taken years for Parquet to include these new types in its format... We could evolve alongside Parquet. Unfortunately, Spark is not known for upgrading its dependencies

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
Hi Reynold and team, I’m glad to see that the Spark community is recognizing the importance of geospatial support. The Sedona community has long been a strong advocate for Spark, and we’ve proudly supported large-scale geospatial workloads on Spark for nearly a decade. We’re absolutely open to

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Ángel Álvarez Pascua
* 1. Domain types evolve quickly.* It has taken years for Parquet to include these new types in its format... We could evolve alongside Parquet. Unfortunately, Spark is not known for upgrading its dependencies quickly. * 2. Geospatial in Java and Python is a dependency hell.* How has

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Reynold Xin
While I don’t think Spark should become a super specialized geospatial processing engine, I don’t think it makes sense to focus *only* on reading and writing from storage. Geospatial is a pretty common and fundamental capability of analytics systems and virtually every mature and popular analytics

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
Sedona community. Since the primary motivation here is Parquet-level support, I suggest shifting the focus of this discussion toward enabling geo support in Spark Parquet DataSource rather than introducing core types. ** Why Spark Should Avoid Hardcoding Domain-Specific Types like geo types

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Menelaos Karavelas
minimal support in Spark, as a common platform, for these types. To be more specific and explicit: The proposal scope is to add support for reading/writing to Parquet, based on the new standard, as well as adding the types as built-in types in Spark to complement the storage support. The few ST

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Szehon Ho
, now that the types are in most common data sources in ecosystem , I think Apache Spark as a common platform needs to have this type definition for inter-op, otherwise users of vanilla Spark cannot work with those data sources with stored geospatial data.  (Imo a similar rationale in adding timestamp

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Menelaos Karavelas
Hello Jia, Wenchen summarized the intent very clearly. The scope of the proposal is primarily the type system and storage, not processing. Let’s work together on the technical details and make sure the work we propose to do in Spark works best with Apache Sedona. Best, Menelaos > On Mar

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Wenchen Fan
Hi Jia, This is a good question. As the shepherd of this SPIP, I'd like to clarify the motivation here: the focus of this project is more about the storage part, not the processing. Apache Sedona is a great library for geo processing, but without native geo type support in Spark, users can

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Szehon Ho
>> /WKB >> <https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary> >> ? >> >> El vie, 28 mar 2025 a las 20:50, Ángel Álvarez Pascua (< >> angel.alvarez.pas...@gmail.com>) escribió: >> >>> +1 (non-bindin

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
+1 (non-binding) El vie, 28 mar 2025, 18:48, Menelaos Karavelas escribió: > Dear Spark community, > > I would like to propose the addition of new geospatial data types > (GEOMETRY and GEOGRAPHY) which represent geospatial values as recently > added as new logical types

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Jia Yu
framework for processing large-scale geospatial data on Spark, Flink, and other engines. >From what I understand, this proposal aims to add native geospatial types and >functionality directly into Spark. However, this seems to replicate much of >the work already done by the Sedona pro

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
text_representation_of_geometry#Well-known_binary> > ? > > El vie, 28 mar 2025 a las 20:50, Ángel Álvarez Pascua (< > angel.alvarez.pas...@gmail.com>) escribió: > >> +1 (non-binding) >> >> El vie, 28 mar 2025, 18:48, Menelaos Karavelas < >> menelao

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Menelaos Karavelas
rez Pascua > (mailto:angel.alvarez.pas...@gmail.com>>) > escribió: >> +1 (non-binding) >> >> El vie, 28 mar 2025, 18:48, Menelaos Karavelas > <mailto:menelaos.karave...@gmail.com>> escribió: >>> Dear Spark community, >>> >>> I w

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
as...@gmail.com>) escribió: > +1 (non-binding) > > El vie, 28 mar 2025, 18:48, Menelaos Karavelas < > menelaos.karave...@gmail.com> escribió: > >> Dear Spark community, >> >> I would like to propose the addition of new geospatial data types >> (GEOMET

[DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Menelaos Karavelas
Dear Spark community, I would like to propose the addition of new geospatial data types (GEOMETRY and GEOGRAPHY) which represent geospatial values as recently added as new logical types in the Parquet specification. The new types should improve Spark’s ability to read the new Parquet logical

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-27 Thread Rozov, Vlad
https://github.com/apache/spark/pull/50437 IMO, it will be better to keep 2 separate commits, one undo revert and one fix, so fix for guava is properly documented. Also, while testing, I see that if I exit the shell and start it again, it fails. Thank you, Vlad On Mar 27, 2025, at 2:33 PM

Spark build failed> File line length exceeds 100 characters

2025-03-27 Thread Ángel Álvarez Pascua
Hi, I'm trying to build the project, but I'm encountering multiple errors due to long lines. Is this expected? I built the project a few weeks ago and don’t recall seeing these errors. Is anyone else experiencing the same issue? [image: image.png] Thanks in advance.

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-27 Thread Hyukjin Kwon
Vlad, let's open a PR and discuss it there. We have many other committees to review / help with as well. On Fri, Mar 28, 2025 at 6:28 AM Rozov, Vlad wrote: > Hi Hyukjin, > > I open https://issues.apache.org/jira/browse/SPARK-51643 and > https://issues.apache.org/jira/browse/SP

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-27 Thread Rozov, Vlad
Hi Hyukjin, I open https://issues.apache.org/jira/browse/SPARK-51643 and https://issues.apache.org/jira/browse/SPARK-51644. Please add more details to the first JIRA. As far as I can see https://github.com/vrozov/spark/tree/spark-shell should fix both JIRAs and if not I’d like to understand

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-27 Thread Mark Hamstra
Back in the very early days of Spark (before it was even an Apache Incubator project), Maven was clearly a more mature, capable and stable tool suite for building, testing and publishing JVM code, even Scala code, so some of the earliest commercial adopters of Spark relied upon Maven. It made

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Here's a bit of history and context: The project was initially built using SBT ( https://github.com/apache/spark/commit/df29d0ea4c8b7137fdd1844219c7d489e3b0d9c9 ). Later, Maven support was added ( https://github.com/apache/spark/commit/811a32257b1b59b042a2871eede6ee39d9e8a137 ) to provi

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Wenchen Fan
A slightly off-topic but related question: It feels fragile to test with SBT while publishing the release with Maven. How did we end up in this situation? Moreover, since most Spark developers use SBT for their daily work, it becomes even harder to catch issues with the Maven build. On Thu, Mar

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
Nah, I wasn't clear. Maven and SBT builds are synced for this special code path, e.g., https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 . If you build Maven and SBT, the results are almost the same. Now, the fix you landed in Maven (and indeed it was a

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0): diff``` - - com.google.common - ${spark.shade.packageName}.connect.guava - -com.google.common.** - - ``` The companion part of this

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0): diff``` - - com.google.common - ${spark.shade.packageName}.connect.guava - -com.google.common.** - - ``` The companion p

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
lines you changed (added in > https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 > ): > > diff``` > - > - com.google.common > - > ${spark.shade.packageName}.connect.guava > - > -co

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
It is not broken. The fix you applied would not be applied in SBT. For example, the lines you changed (added in https://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 ): diff``` - - com.google.common - ${spark.shade.packageName}.connect.guava

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Jungtaek Lim
+1 on explanation that it is not happening only to Vlad but always happening as a normal process. Vlad, if we are very strict about ASF voting policy, we have to have three +1s without -1 to merge the code change. I don't think the major projects in ASF follow it - instead, they (including

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
That only fixes Maven. Both SBT build and Maven build should work in the same or similar wat. Let's make sure both work. On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad wrote: > Please see https://github.com/vrozov/spark/tree/spark-shell. I tested > only spark-shell —remote local aft

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
filed JIRA, please provide the link, if not, please open one. It took me 2 hours to fix Spark shells, so should you open JIRA instead of spending time to identify the commit and reverting it, you will save time as well. I’ll post fix once JIRA is open and I validate that my understanding of

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Please see https://github.com/vrozov/spark/tree/spark-shell. I tested only spark-shell —remote local after building with maven and sbt. It may not be a complete fix and there is no PR. I’ll look into SBT build issue (assuming that there is still one after the fix) once you file JIRA. Thank you

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Nicholas Chammas
> would expect you to open JIRA and outline what is broken. If you filed JIRA, > please provide the link, if not, please open one. > > It took me 2 hours to fix Spark shells, so should you open JIRA instead of > spending time to identify the commit and reverting it, you will sa

performance issue Spark 3.5.2 on kubernetes

2025-03-26 Thread Prem Sahoo
Hello Team, I was working with Spark 3.2 and Hadoop 2.7.6 and writing to MinIO object storage . It was slower when compared to writing to MapR FS with the above tech stack. Then moved on to a later upgraded version of Spark 3.5.2 and Hadoop 4.3.1 which started writing to MinIO with V2

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
line what is broken. If you > filed JIRA, please provide the link, if not, please open one. > > It took me 2 hours to fix Spark shells, so should you open JIRA instead of > spending time to identify the commit and reverting it, you will save time > as well. I’ll post fix once JIRA is open

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Reynold Xin
not sure I follow your question. I’ll open PR with the fix > once JIRA is open. > > While I am new to the Spark community, I am not new to the Apache projects > and open source. Committers are guardians for commits and they keep not > only master branch, but the entire source code in shap

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Rozov, Vlad
Reynold, I am not sure I follow your question. I’ll open PR with the fix once JIRA is open. While I am new to the Spark community, I am not new to the Apache projects and open source. Committers are guardians for commits and they keep not only master branch, but the entire source code in shape

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-26 Thread Hyukjin Kwon
://github.com/apache/spark/commit/e927a7edad47f449aeb0d5014b6185ac36b344d0 . Should also test Spark shells, and describe how you tested it as well. This is what I expect: - Please show me if there is a simple fix. If that's the case, yes, I will revert this out from the master branch. That works for me.

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Reynold Xin
r can > create quickly? If yes, we can merge the fix. If there isn't, for major > functionality breaking change, we should just revert. That's fairly basic > software engineering practices. > > > On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon wrote: > >> With the c

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Rozov, Vlad
quickly? If yes, we can merge the fix. If there isn't, for major functionality breaking change, we should just revert. That's fairly basic software engineering practices. On Tue, Mar 25, 2025 at 9:53 PM Hyukjin Kwon mailto:gurwls...@apache.org>> wrote: With the change, the main ent

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Hyukjin Kwon
With the change, the main entry points, Spark shalls, don't work and developers cannot debug and test. The snapshots become uesless. The tests passed because you did not fix SBT. It needs a larger change. Such change cannot be in the source. I can start a vote if you think this is an issue.

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Reynold Xin
ukjin Kwon wrote: > With the change, the main entry points, Spark shalls, don't work and > developers cannot debug and test. The snapshots become uesless. > > The tests passed because you did not fix SBT. It needs a larger change. > > Such change cannot be in the source. I c

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-03-25 Thread Rozov, Vlad
This does not make any sense. 1. There are no broken tests introduced by https://github.com/apache/spark/pull/49971 2. There are no JIRA filed for “the main entry point” 3. “The main entry point” that does not have any unit test suggests that it is not the main entry point. 4. It is not

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Wenchen Fan
Wenchen On Wed, Mar 26, 2025 at 11:02 AM Hyukjin Kwon wrote: > I am confused. The consensus is made pretty clearly in > https://github.com/apache/spark/pull/50378, CI passed. Now it has 9 +1s > from all different groups. > Why do we need to change the way? I don't think we

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Hyukjin Kwon
I am confused. The consensus is made pretty clearly in https://github.com/apache/spark/pull/50378, CI passed. Now it has 9 +1s from all different groups. Why do we need to change the way? I don't think we should override the community consensus because you think the approach is hacky. On We

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Rozov, Vlad
to be removed. 2. The question is when and how to remove them. My initial assumption was that jars would be removed as part of 4.1.0 and backported to 3.5.x. 3. With the above assumption I voted -0 on 3.5.5 and open https://github.com/apache/spark/pull/50231 WIP PR with the plan to still vote -0

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Jungtaek Lim
Vlad, We are conflicted because you immediately want the project to fix the issue, while Dongjoon stated in the post that he does not want to block the release just because of this. We delayed the release of Apache Spark 4.0.0 a lot already (going to be month"s" now), and I do not want

Re: [DISCUSS] SPARK-51318: Remove `jar` files from Apache Spark repository and disable affected tests

2025-03-25 Thread Hyukjin Kwon
ch, I disagree and commit. Note that I still have an outstanding comment on your PR https://github.com/apache/spark/pull/50378#discussion_r2012935532. My PR does not cause the issue. I keep it AS IS, and fix the issue raised in the thread. Let's not mix other issues orthogonal with my PR. For s

  1   2   3   4   5   6   7   8   9   10   >