Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-06 Thread Rozov, Vlad
+1 (non-binding) On May 6, 2025, at 6:33 AM, Sakthi wrote: +1 (non-binding) On Tue, May 6, 2025 at 2:00 AM Jungtaek Lim mailto:kabhwan.opensou...@gmail.com>> wrote: +1 (non-binding) Nice addition on Spark Connect! On Tue, May 6, 2025 at 5:47 PM Peter Toth mailto:peter.t...@gma

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-06 Thread Menelaos Karavelas
Hi Mridul, Just wanted to add that we intent to work with the Apache Sedona community anyways going forward. - Menelaos > On May 6, 2025, at 6:45 AM, Wenchen Fan wrote: > > Hi Mridul, > > The conclusion is that we will standardize the basic geo data types in Spark, >

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-06 Thread Dongjoon Hyun
b.com/apache/spark-connect-swift/blob/v0.1.0-rc1/Sources/SparkConnect/DataFrame.swift#L276-L288 > > Shows that DataFrame operations explicitly set plaintext despite the actual > client being configured using TLS. > > The current implementation works OK for local plaintext connections

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-06 Thread Dongjoon Hyun
To Rozov, we use "Apache SkyWalking Eyes" in our GitHub Action. - https://github.com/apache/skywalking-eyes - https://github.com/apache/spark-kubernetes-operator/blob/6116bb08c282911389fe2f5af49794a456111e97/.github/workflows/build_and_test.yml#L24 In addition, you can downlo

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-06 Thread Dongjoon Hyun
ilan Stefanovic > > >>>>>>>> mailto:stefanovic.mila...@gmail.com>>: > > >>>>>>>>> > > >>>>>>>>> +1 (non-binding) > > >>>>>>>>> > > >>>>>

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-06 Thread Rozov, Vlad
..@gmail.com>>: > >>>>>>>>> > >>>>>>>>> +1 (non-binding) > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Milan > >>>&

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-06 Thread Wenchen Fan
Hi Mridul, The conclusion is that we will standardize the basic geo data types in Spark, which allows third-party data sources and user-defined functions to support geo data types natively when integrating with Spark. The majority of geo processing functions will still be in Apache Sedona (or

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-06 Thread Sakthi
art of the gradle build? If not, how headers are > > >>>>>> validated to include correct license? > > >>>>>> > > >>>>>> Thank you, > > >>>>>> > > >>>>>> Vlad > > >>>

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-06 Thread Sakthi
gt; >>>>>>>> +1 >> > >>>>>>>> >> > >>>>>>>> man. 5. mai 2025 kl. 21:28 skrev Milan Stefanovic < >> stefanovic.mila...@gmail.com>: >> > >>>>>>>>> >> > >>>>>>>>&

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-06 Thread Sakthi
+1 (non-binding) On Tue, May 6, 2025 at 2:00 AM Jungtaek Lim wrote: > +1 (non-binding) Nice addition on Spark Connect! > > On Tue, May 6, 2025 at 5:47 PM Peter Toth wrote: > >> +1 >> >> On Tue, May 6, 2025 at 9:59 AM Yang Jie wrote: >> >>> +1, A bi

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-06 Thread Jungtaek Lim
+1 (non-binding) Nice addition on Spark Connect! On Tue, May 6, 2025 at 5:47 PM Peter Toth wrote: > +1 > > On Tue, May 6, 2025 at 9:59 AM Yang Jie wrote: > >> +1, A big thank you to Dongjoon for all the hard work you've put into >> this! >> >> On 2

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-06 Thread Peter Toth
+1 On Tue, May 6, 2025 at 9:59 AM Yang Jie wrote: > +1, A big thank you to Dongjoon for all the hard work you've put into this! > > On 2025/05/05 18:19:33 DB Tsai wrote: > > +1, it’s exciting to see Spark Connect Swift client, showcasing Spark > Connect > > as a tru

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-06 Thread Max Gekk
gt; > > >>>>>>>>> +1 (non-binding) > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Milan > > >>>>>>>>> > > >>>>>>>>> On Mon, 5 May 2

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-06 Thread Yang Jie
+1, A big thank you to Dongjoon for all the hard work you've put into this! On 2025/05/05 18:19:33 DB Tsai wrote: > +1, it’s exciting to see Spark Connect Swift client, showcasing Spark Connect > as a truly language-agnostic protocol, and also powering Swift users to

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-06 Thread Yang Jie
he gradle build? If not, how headers are > >>>>>> validated to include correct license? > >>>>>> > >>>>>> Thank you, > >>>>>> > >>>>>> Vlad > >>>>>> > >>>>>> > >>>>>&g

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Mridul Muralidharan
;>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Milan >>>>>>>>> >>>>>>>>> On Mon, 5 May 2025 at 21:25, Jia Yu wrote: >>>>>>>>> >>&g

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread L. C. Hsieh
efanovic >>>>>>>> : >>>>>>>>> >>>>>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Milan >>>>>>>>> >&g

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Gengliang Wang
;>> Thanks, >>>>>>>> Milan >>>>>>>> >>>>>>>> On Mon, 5 May 2025 at 21:25, Jia Yu wrote: >>>>>>>> >>>>>>>>> Thanks for putting this together. >>>>&

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread Kent Yao
gradle build? If not, how headers are >>>>>> validated to include correct license? >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Vlad >>>>>> >>>>>> >>>>>> >>&g

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread Hyukjin Kwon
> >>>>> >>>>> > On May 4, 2025, at 5:38 PM, Dongjoon Hyun >>>>> wrote: >>>>> > >>>>> > >>>>> > >>>>> > +1 >>>>> > >>>>> >

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread Ruifeng Zheng
t;> Thank you, >>>> >>>> Vlad >>>> >>>> >>>> >>>> > On May 4, 2025, at 5:38 PM, Dongjoon Hyun >>>> wrote: >>>> > >>>> > >>>> > >>>> > +1 >>>>

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Ruifeng Zheng
t;>>>>> Thanks for putting this together. >>>>>>>> >>>>>>>> +0 (non-binding) from my side. Happy to see geospatial data is >>>>>>>> getting attention but we need to make it right. >>>>>>>&g

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Jules Damji
t;>>>>>> >>>>>>>> Thanks for putting this together. >>>>>>>> >>>>>>>> +0 (non-binding) from my side. Happy to see geospatial data is >>>>>>>> getting attention but we need to make it r

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Xiao Li
>>>>>> On Mon, 5 May 2025 at 21:25, Jia Yu wrote: >>>>>> >>>>>>> Thanks for putting this together. >>>>>>> >>>>>>> +0 (non-binding) from my side. Happy to see geospatial data is >>>>>>&g

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Yuming Wang
; >>>>>> +0 (non-binding) from my side. Happy to see geospatial data is >>>>>> getting attention but we need to make it right. >>>>>> >>>>>> >>>>>> Jia Yu >>>>>> >>>>>> >>>

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Denny Lee
: >>>> >>>>> Thanks for putting this together. >>>>> >>>>> +0 (non-binding) from my side. Happy to see geospatial data is getting >>>>> attention but we need to make it right. >>>>> >>>>> >>>&

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Wenchen Fan
a is getting >>>> attention but we need to make it right. >>>> >>>> >>>> Jia Yu >>>> >>>> >>>> >>>> On Mon, May 5, 2025 at 12:15 PM Szehon Ho >>>> wrote: >>>> >&

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread Wenchen Fan
gt;>> > >>> > +1 >>> > >>> > I checked the checksum and signatures, and tested with K8s v1.32. >>> > >>> > Dongjoon. >>> > >>> > On 2025/05/04 23:58:54 Zhou Jiang wrote: >>> >> +1 , thanks

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Reynold Xin
> >>> On Mon, May 5, 2025 at 12:15 PM Szehon Ho >>> wrote: >>> >>>> +1 (non binding) >>>> >>>> Thanks >>>> Szehon >>>> >>>> On Mon, May 5, 2025 at 11:17 AM DB Tsai wrote: >>>> >>>

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Bjørn Jørgensen
side. Happy to see geospatial data is getting >> attention but we need to make it right. >> >> >> Jia Yu >> >> >> >> On Mon, May 5, 2025 at 12:15 PM Szehon Ho >> wrote: >> >>> +1 (non binding) >>> >>> Thanks >>&g

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Milan Stefanovic
May 5, 2025 at 12:15 PM Szehon Ho wrote: > >> +1 (non binding) >> >> Thanks >> Szehon >> >> On Mon, May 5, 2025 at 11:17 AM DB Tsai wrote: >> >>> +1, geospatial types will be a great feature for Spark. Thanks for >>> working on it. &

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Jia Yu
 AM DB Tsai wrote: > >> +1, geospatial types will be a great feature for Spark. Thanks for >> working on it. >> >> On May 5, 2025, at 11:04 AM, Menelaos Karavelas < >> menelaos.karave...@gmail.com> wrote: >> >> I started the discussion on addin

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Szehon Ho
+1 (non binding) Thanks Szehon On Mon, May 5, 2025 at 11:17 AM DB Tsai wrote: > +1, geospatial types will be a great feature for Spark. Thanks for working > on it. > > On May 5, 2025, at 11:04 AM, Menelaos Karavelas < > menelaos.karave...@gmail.com> wrote: > > 

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread DB Tsai
.h...@gmail.com> wrote: >> >>> Please vote on releasing the following candidate as Apache Spark K8s >>> Operator 0.1.0. This vote is open for the next 72 hours and passes if a >>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes. >>> >&g

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-05 Thread DB Tsai
+1, it’s exciting to see Spark Connect Swift client, showcasing Spark Connect as a truly language-agnostic protocol, and also powering Swift users to use Spark!Sent from my iPhoneOn May 5, 2025, at 1:11 AM, Gabor Somogyi wrote:+1 (non-binding)GOn Mon, May 5, 2025 at 8:35 AM huaxin gao <huaxin

Re: [VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread DB Tsai
+1, geospatial types will be a great feature for Spark. Thanks for working on it. > On May 5, 2025, at 11:04 AM, Menelaos Karavelas > wrote: > > I started the discussion on adding geospatial types to Spark on March 28th. > Since then there has been some discussion in the dev m

[VOTE] SPIP: Add geospatial types to Spark

2025-05-05 Thread Menelaos Karavelas
I started the discussion on adding geospatial types to Spark on March 28th. Since then there has been some discussion in the dev mailing list, as well as in the SPIP doc. At this point I would like to move to a formal vote on adding support for geospatial types to Spark. *Discussion thread

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-05 Thread Martin Grund
Not sure this counts as -1, but by cursory checking the code, I found that the way the TLS connection is set up is not always working: https://github.com/apache/spark-connect-swift/blob/v0.1.0-rc1/Sources/SparkConnect/DataFrame.swift#L276-L288 Shows that DataFrame operations explicitly set

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-05 Thread Gabor Somogyi
+1 (non-binding) G On Mon, May 5, 2025 at 8:35 AM huaxin gao wrote: > +1 Thanks Dongjoon. > > On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun wrote: > >> +1 >> >> I checked the checksum and signatures, and tested with Apache Spark 4.0.0 >> RC4 on Swift 6.

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread Gabor Somogyi
un wrote: >> > >> > >> > >> > +1 >> > >> > I checked the checksum and signatures, and tested with K8s v1.32. >> > >> > Dongjoon. >> > >> > On 2025/05/04 23:58:54 Zhou Jiang wrote: >> >> +1 , thanks

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-05 Thread kazuyuki tanimura
4 23:58:54 Zhou Jiang wrote: >> >> +1 , thanks for driving this release! >> >> >> >> *Zhou JIANG* >> >> >> >> >> >> >> >> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun > >> <mailto:dongjoon.h...@gma

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-05 Thread kazuyuki tanimura
+1 (non-binding) Kazu > On May 4, 2025, at 11:31 PM, huaxin gao wrote: > > +1 Thanks Dongjoon. > > On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun <mailto:dongj...@apache.org>> wrote: >> +1 >> >> I checked the checksum and signatures, and tested wi

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread huaxin gao
gt; Dongjoon. > > > > On 2025/05/04 23:58:54 Zhou Jiang wrote: > >> +1 , thanks for driving this release! > >> > >> *Zhou JIANG* > >> > >> > >> > >> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun > wrote: > >> > >>

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-04 Thread huaxin gao
+1 Thanks Dongjoon. On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun wrote: > +1 > > I checked the checksum and signatures, and tested with Apache Spark 4.0.0 > RC4 on Swift 6.1. > > This is the initial release (v0.1) with 105 patches to provide a tangible > release to the use

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-04 Thread L. C. Hsieh
+1 On Sun, May 4, 2025 at 3:15 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark Connect > Swift Client 0.1.0. This vote is open for the next 72 hours and passes if a > majority +1 PMC votes are cast, with a minimum of 3 +1 votes. > &

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread L. C. Hsieh
+1 On Sun, May 4, 2025 at 4:58 PM Dongjoon Hyun wrote: > > Please vote on releasing the following candidate as Apache Spark K8s Operator > 0.1.0. This vote is open for the next 72 hours and passes if a majority +1 > PMC votes are cast, with a minimum of 3 +1 votes. > > [

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread Rozov, Vlad
> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun wrote: >> >>> Please vote on releasing the following candidate as Apache Spark K8s >>> Operator 0.1.0. This vote is open for the next 72 hours and passes if a >>> majority +1 PMC votes are cast, with a minimum of 3 +1 v

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread Dongjoon Hyun
easing the following candidate as Apache Spark K8s > > Operator 0.1.0. This vote is open for the next 72 hours and passes if a > > majority +1 PMC votes are cast, with a minimum of 3 +1 votes. > > > > [ ] +1 Release this package as Apache Spark K8s Operator 0.1.0 > >

Re: [VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-04 Thread Dongjoon Hyun
+1 I checked the checksum and signatures, and tested with Apache Spark 4.0.0 RC4 on Swift 6.1. This is the initial release (v0.1) with 105 patches to provide a tangible release to the users. v0.2 is under planning in SPARK-51999. Dongjoon. On 2025/05/04 22:14:54 Dongjoon Hyun wrote

Re: [VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread Zhou Jiang
+1 , thanks for driving this release! *Zhou JIANG* On Sun, May 4, 2025 at 16:58 Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark K8s > Operator 0.1.0. This vote is open for the next 72 hours and passes if a > majority +1 PMC votes are cas

[VOTE] Release Apache Spark K8s Operator 0.1.0 (RC1)

2025-05-04 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark K8s Operator 0.1.0. This vote is open for the next 72 hours and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark K8s Operator 0.1.0 [ ] -1 Do not release this

[VOTE] Release Apache Spark Connect Swift Client 0.1.0 (RC1)

2025-05-04 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark Connect Swift Client 0.1.0. This vote is open for the next 72 hours and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark Connect Swift Client 0.1.0 [ ] -1 Do not

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Cheng Pan
Does the following options works for you? ./bin/spark-shell --conf spark.jars.ivy=${HOME}/.ivy2 ./bin/spark-shell --conf spark.jars.ivy=/Users/yourname/.ivy2 I think the issue is that ~ is not interpreted by shell and just passthrough to the Ivy lib. Thanks, Cheng Pan > On Apr 29, 2025,

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Wenchen Fan
Hi Jacek, Thanks for the confirmation! Let's change the wording first, and open a JIRA ticket for the relative path support. Wenchen On Tue, Apr 29, 2025 at 2:41 AM Jacek Laskowski wrote: > Hi Wenchen, > > Looks like it didn't work in 3.5 either. > > ❯ ./bin/spark-s

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-28 Thread Jacek Laskowski
Hi Wenchen, Looks like it didn't work in 3.5 either. ❯ ./bin/spark-shell --version 25/04/28 20:37:48 WARN Utils: Your hostname, Jaceks-Mac-mini.local resolves to a loopback address: 127.0.0.1; using 192.168.68.100 instead (on interface en1) 25/04/28 20:37:48 WARN Utils: Set SPARK_LOCAL_IP i

Re: Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-27 Thread Wenchen Fan
Hi Jacek, Thanks for reporting the issue! Did you hit the same problem when you set the `spark.jars.ivy` config with Spark 3.5? If this config never worked with a relative path, we should change the wording in the migration guide. Thanks, Wenchen On Sun, Apr 27, 2025 at 10:27 PM Jacek Laskowski

Issue with Spark 4.0.0rc4 and ~/.ivy2.5.2

2025-04-27 Thread Jacek Laskowski
Hi, I found in docs/core-migration-guide.md: - Since Spark 4.0, Spark uses `~/.ivy2.5.2` as Ivy user directory by default to isolate the existing systems from Apache Ivy's incompatibility. To restore the legacy behavior, you can set `spark.jars.ivy` to `~/.ivy2`. With that, I

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-23 Thread Szehon Ho
One more small fix (on another topic) for the next RC: https://github.com/apache/spark/pull/50685 Thanks! Szehon On Tue, Apr 22, 2025 at 10:07 AM Rozov, Vlad wrote: > Correct, to me it looks like a Spark bug > https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to > tr

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-22 Thread Rozov, Vlad
Correct, to me it looks like a Spark bug https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to trigger and is reproduce using the test case provided in https://github.com/apache/spark/pull/50594: 1. Spark UninterruptibleThread “task” is interrupted by “test” thread while “task

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-22 Thread Wenchen Fan
Correct me if I'm wrong: this is a long-standing Spark bug that is very hard to trigger, but the new Parquet version happens to hit the trigger condition and exposes the bug. If this is the case, I'm +1 to fix the Spark bug instead of downgrading the Parquet version. Let's mov

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Manu Zhang
I don't think PARQUET-2432 has any issue itself. It looks to have triggered a deadlock case like https://github.com/apache/spark/pull/50594. I'd suggest that we fix forward if possible. Thanks, Manu On Mon, Apr 21, 2025 at 11:19 PM Rozov, Vlad wrote: > The deadlock is reprodu

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Rozov, Vlad
The deadlock is reproducible without Parquet. Please see https://github.com/apache/spark/pull/50594. Thank you, Vlad On Apr 21, 2025, at 1:59 AM, Cheng Pan wrote: The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the latest workable version is Parquet 1.13.1

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Cheng Pan
The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the latest workable version is Parquet 1.13.1. Thanks, Cheng Pan > On Apr 21, 2025, at 16:53, Wenchen Fan wrote: > > +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to > https://github.com/

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-21 Thread Wenchen Fan
+1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to https://github.com/apache/spark/pull/50583#issuecomment-2815243571 , the Parquet CVE does not affect Spark. On Mon, Apr 21, 2025 at 2:45 PM Hyukjin Kwon wrote: > That's nice but we need to wait for them to release, and upgra

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-20 Thread Yuming Wang
It seems this patch(https://github.com/apache/parquet-java/pull/3196) can avoid deadlock issue if using Parquet 1.15.1. On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar wrote: > I found another bug introduced in 4.0 that breaks Spark connect client x > server compatibility: https://gith

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-20 Thread Hyukjin Kwon
uet-java/pull/3196) can > avoid deadlock issue if using Parquet 1.15.1. > > On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar > wrote: > >> I found another bug introduced in 4.0 that breaks Spark connect client x >> server compatibility: https://github.com/apache/spark/

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-16 Thread Niranjan Jayakar
I found another bug introduced in 4.0 that breaks Spark connect client x server compatibility: https://github.com/apache/spark/pull/50604. Once merged, this should be included in the next RC. On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan wrote: > Please vote on releasing the following candid

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-15 Thread Rozov, Vlad
It may not be the Parquet introduced issue. It looks like a race condition between Spark UninterruptibleThread and Hadoop/HDFS DFSOutputStream. I tried to resolve the deadlock in https://github.com/apache/spark/pull/50594. Can you give it a try? I will see if I can reproduce the deadlock in a

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-15 Thread Yuming Wang
ava.base@17.0.6/Thread.java:833) Found 1 deadlock. On Mon, Apr 14, 2025 at 11:13 AM Hyukjin Kwon wrote: > Made a fix at https://github.com/apache/spark/pull/50575 👍 > > On Mon, 14 Apr 2025 at 11:42, Wenchen Fan wrote: > >> I'm testing the new spark-connect distribution

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-14 Thread Yuming Wang
s working on, or are you still > investigating it? If the issue is confirmed by the Parquet community, we > can probably roll back to the previous Parquet version for Spark 4.0. > > Thanks, > Wenchen > > On Tue, Apr 15, 2025 at 7:24 AM Yuming Wang wrote: > >> This rel

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-14 Thread Wenchen Fan
Hi Yuming, 1.51.1 is the latest release of Apache Parquet for the 1.x line. Is it a known issue the Parquet community is working on, or are you still investigating it? If the issue is confirmed by the Parquet community, we can probably roll back to the previous Parquet version for Spark 4.0

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-13 Thread Hyukjin Kwon
Made a fix at https://github.com/apache/spark/pull/50575 👍 On Mon, 14 Apr 2025 at 11:42, Wenchen Fan wrote: > I'm testing the new spark-connect distribution and here is the result: > > 4 packages are tested: pip install pyspark, pip install pyspark_connect (I > installed

Re: [VOTE] Release Spark 4.0.0 (RC4)

2025-04-13 Thread Wenchen Fan
I'm testing the new spark-connect distribution and here is the result: 4 packages are tested: pip install pyspark, pip install pyspark_connect (I installed them with the RC4 pyspark tarballs), the classic tarball (spark-4.0.0-bin-hadoop3.tgz), the connect tarball (spark-4.0.0-bin-hadoop3-

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
{ "emoji": "👍", "version": 1 }

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
performance needs to be checked.With YARN and External Spark Shuffle, the sparkshuffle is a lot more optimized, so we can experience slowness with spark on k8s, especially if there is a pod restart. Have you checked Apache Uniffle / Celeborn for enabling spark shuffle ? fyi .. i'm

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
Pls check if there are resource constraints on the pods/nodes especially if they are shared. MinIO connectivity performance needs to be checked. With YARN and External Spark Shuffle, the sparkshuffle is a lot more optimized, so we can experience slowness with spark on k8s, especially if there is

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Karan,I am using Spark open source in kubernetes and Spark mapr bundle in YARN.For launching job in both approach it takes same 10 secs .For shuffle I am using local in both yarn and kubernetes.Sent from my iPhoneOn Apr 11, 2025, at 11:24 AM, karan alang wrote:Hi Prem,Which distribution of

Re: SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread karan alang
Hi Prem, Which distribution of Spark are you using ? how long does it take to launch the job ? wrt Spark Shuffle, what is the approach you are using - storing shuffle data in MinIO or using host path ? regds, Karan On Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo wrote: > Hello Team, > I

SPARK ON KUBERNETS IS SLOW

2025-04-11 Thread Prem Sahoo
Hello Team, I have a peculiar case of Spark slowness. I am using Minio as Object storage from where Spark reads & writes data. I am using YARN as Master and executing a Spark job which takes ~5mins the same job when run with Kubernetes as Master it takes ~8 mins . I checked the Spark DAG in

[VOTE] Release Spark 4.0.0 (RC4)

2025-04-10 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0. The vote is open until April 15 (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0 [ ] -1 Do not release this package because ... To

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-04-05 Thread Ángel Álvarez Pascua
this proposal now ... 😂 *"If you haven’t encountered this kind of ‘dependency hell’ while working on geospatial projects with Spark, you may have been fortunate to deal with relatively simple cases."* Yes, that was the case for us. We loaded OpenStreetMap data from Spain, calculated some Have

Re: Spark build failed> File line length exceeds 100 characters

2025-04-05 Thread Ángel Álvarez Pascua
I've noticed that the check is set in *scalastyle-config.xml*: true Given this configuration, how is it possible that some people have been able to commit changes violating this rule? Moreover, how were these changes even merged despite failing this validation? It seems like

Re: Revert of [SPARK-51229][BUILD][CONNECT] Fix dependency:analyze goal on connect common

2025-04-04 Thread Rozov, Vlad
you, Vlad On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon wrote: That only fixes Maven. Both SBT build and Maven build should work in the same or similar wat. Let's make sure both work. On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad wrote: Please see https://github.com/vrozov/spark/tree/spark-she

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-04-04 Thread Steve Loughran
#Options_to_Tune But you need that underlying hosting infra to be the same before making comparisons about the layers above. Why not start by either replicating your previous setup in k8s or running spark 3.5 standalone outside k8s and comparing it to spark 3.2 in the same environment? On Tue, 25 Mar 2025 at

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-31 Thread huaxin gao
Hi Wenchen, Could you please wait for https://github.com/apache/spark/pull/50246 to be merged before you cut the next RC? Thanks, Huaxin On Mon, Mar 31, 2025 at 8:53 PM Wenchen Fan wrote: > Hi all, > > Thanks for your feedback! Regarding > https://github.com/apache/spark/pull/501

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-31 Thread Wenchen Fan
Hi all, Thanks for your feedback! Regarding https://github.com/apache/spark/pull/50187 , I don't think it's a 4.0 blocker as it's a CI issue for the examples module. Other than that, all other issues have been resolved and I'll cut the next RC after https://github.com/apache

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-31 Thread Wenchen Fan
believe it’s important to standardize common data types in Spark and clearly define the boundaries between different layers in the Lakehouse ecosystem. While it makes sense for Apache Sedona to have its own Parquet data source for geospatial types in the absence of a standard, the long-term goal

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-31 Thread Steve Loughran
. I'd be curious about what those numbers are -though they only measure task/job commit, not all the work (that's not quite true, but...) You can get a log of all S3 IO performed for an entire Spark job across all worker threads, via the S3 auditing, https://hadoop.apache.org/docs/stable/

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-30 Thread Szehon Ho
again for the expertise from Sedona side in these efforts. Thanks! Szehon Sent from my iPhone > On Mar 29, 2025, at 11:42 PM, Jia Yu wrote: > > Hi Reynold and team, > > I’m glad to see that the Spark community is recognizing the importance > of geospatial support. The Se

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-30 Thread Jia Yu
Hey Angel, I am glad that you asked these questions. Please see my answers below. *1. Domain types evolve quickly. - It has taken years for Parquet to include these new types in its format... We could evolve alongside Parquet. Unfortunately, Spark is not known for upgrading its dependencies

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
Hi Reynold and team, I’m glad to see that the Spark community is recognizing the importance of geospatial support. The Sedona community has long been a strong advocate for Spark, and we’ve proudly supported large-scale geospatial workloads on Spark for nearly a decade. We’re absolutely open to

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Ángel Álvarez Pascua
* 1. Domain types evolve quickly.* It has taken years for Parquet to include these new types in its format... We could evolve alongside Parquet. Unfortunately, Spark is not known for upgrading its dependencies quickly. * 2. Geospatial in Java and Python is a dependency hell.* How has

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Reynold Xin
While I don’t think Spark should become a super specialized geospatial processing engine, I don’t think it makes sense to focus *only* on reading and writing from storage. Geospatial is a pretty common and fundamental capability of analytics systems and virtually every mature and popular analytics

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Jia Yu
Sedona community. Since the primary motivation here is Parquet-level support, I suggest shifting the focus of this discussion toward enabling geo support in Spark Parquet DataSource rather than introducing core types. ** Why Spark Should Avoid Hardcoding Domain-Specific Types like geo types

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Menelaos Karavelas
minimal support in Spark, as a common platform, for these types. To be more specific and explicit: The proposal scope is to add support for reading/writing to Parquet, based on the new standard, as well as adding the types as built-in types in Spark to complement the storage support. The few ST

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Szehon Ho
, now that the types are in most common data sources in ecosystem , I think Apache Spark as a common platform needs to have this type definition for inter-op, otherwise users of vanilla Spark cannot work with those data sources with stored geospatial data.  (Imo a similar rationale in adding timestamp

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Menelaos Karavelas
Hello Jia, Wenchen summarized the intent very clearly. The scope of the proposal is primarily the type system and storage, not processing. Let’s work together on the technical details and make sure the work we propose to do in Spark works best with Apache Sedona. Best, Menelaos > On Mar

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Wenchen Fan
Hi Jia, This is a good question. As the shepherd of this SPIP, I'd like to clarify the motivation here: the focus of this project is more about the storage part, not the processing. Apache Sedona is a great library for geo processing, but without native geo type support in Spark, users can

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Szehon Ho
>> /WKB >> <https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary> >> ? >> >> El vie, 28 mar 2025 a las 20:50, Ángel Álvarez Pascua (< >> angel.alvarez.pas...@gmail.com>) escribió: >> >>> +1 (non-bindin

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
+1 (non-binding) El vie, 28 mar 2025, 18:48, Menelaos Karavelas escribió: > Dear Spark community, > > I would like to propose the addition of new geospatial data types > (GEOMETRY and GEOGRAPHY) which represent geospatial values as recently > added as new logical types

  1   2   3   4   5   6   7   8   9   10   >