+1 (non-binding)
On May 6, 2025, at 6:33 AM, Sakthi wrote:
+1 (non-binding)
On Tue, May 6, 2025 at 2:00 AM Jungtaek Lim
mailto:kabhwan.opensou...@gmail.com>> wrote:
+1 (non-binding) Nice addition on Spark Connect!
On Tue, May 6, 2025 at 5:47 PM Peter Toth
mailto:peter.t...@gma
Hi Mridul,
Just wanted to add that we intent to work with the Apache Sedona community
anyways going forward.
- Menelaos
> On May 6, 2025, at 6:45 AM, Wenchen Fan wrote:
>
> Hi Mridul,
>
> The conclusion is that we will standardize the basic geo data types in Spark,
>
b.com/apache/spark-connect-swift/blob/v0.1.0-rc1/Sources/SparkConnect/DataFrame.swift#L276-L288
>
> Shows that DataFrame operations explicitly set plaintext despite the actual
> client being configured using TLS.
>
> The current implementation works OK for local plaintext connections
To Rozov, we use "Apache SkyWalking Eyes" in our GitHub Action.
- https://github.com/apache/skywalking-eyes
-
https://github.com/apache/spark-kubernetes-operator/blob/6116bb08c282911389fe2f5af49794a456111e97/.github/workflows/build_and_test.yml#L24
In addition, you can downlo
ilan Stefanovic
> > >>>>>>>> mailto:stefanovic.mila...@gmail.com>>:
> > >>>>>>>>>
> > >>>>>>>>> +1 (non-binding)
> > >>>>>>>>>
> > >>>>>
..@gmail.com>>:
> >>>>>>>>>
> >>>>>>>>> +1 (non-binding)
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Milan
> >>>&
Hi Mridul,
The conclusion is that we will standardize the basic geo data types in
Spark, which allows third-party data sources and user-defined functions to
support geo data types natively when integrating with Spark. The majority
of geo processing functions will still be in Apache Sedona (or
art of the gradle build? If not, how headers are
> > >>>>>> validated to include correct license?
> > >>>>>>
> > >>>>>> Thank you,
> > >>>>>>
> > >>>>>> Vlad
> > >>>
gt; >>>>>>>> +1
>> > >>>>>>>>
>> > >>>>>>>> man. 5. mai 2025 kl. 21:28 skrev Milan Stefanovic <
>> stefanovic.mila...@gmail.com>:
>> > >>>>>>>>>
>> > >>>>>>>>&
+1 (non-binding)
On Tue, May 6, 2025 at 2:00 AM Jungtaek Lim
wrote:
> +1 (non-binding) Nice addition on Spark Connect!
>
> On Tue, May 6, 2025 at 5:47 PM Peter Toth wrote:
>
>> +1
>>
>> On Tue, May 6, 2025 at 9:59 AM Yang Jie wrote:
>>
>>> +1, A bi
+1 (non-binding) Nice addition on Spark Connect!
On Tue, May 6, 2025 at 5:47 PM Peter Toth wrote:
> +1
>
> On Tue, May 6, 2025 at 9:59 AM Yang Jie wrote:
>
>> +1, A big thank you to Dongjoon for all the hard work you've put into
>> this!
>>
>> On 2
+1
On Tue, May 6, 2025 at 9:59 AM Yang Jie wrote:
> +1, A big thank you to Dongjoon for all the hard work you've put into this!
>
> On 2025/05/05 18:19:33 DB Tsai wrote:
> > +1, it’s exciting to see Spark Connect Swift client, showcasing Spark
> Connect
> > as a tru
gt;
> > >>>>>>>>> +1 (non-binding)
> > >>>>>>>>>
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Milan
> > >>>>>>>>>
> > >>>>>>>>> On Mon, 5 May 2
+1, A big thank you to Dongjoon for all the hard work you've put into this!
On 2025/05/05 18:19:33 DB Tsai wrote:
> +1, it’s exciting to see Spark Connect Swift client, showcasing Spark Connect
> as a truly language-agnostic protocol, and also powering Swift users to
he gradle build? If not, how headers are
> >>>>>> validated to include correct license?
> >>>>>>
> >>>>>> Thank you,
> >>>>>>
> >>>>>> Vlad
> >>>>>>
> >>>>>>
> >>>>>&g
;>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Milan
>>>>>>>>>
>>>>>>>>> On Mon, 5 May 2025 at 21:25, Jia Yu wrote:
>>>>>>>>>
>>&g
efanovic
>>>>>>>> :
>>>>>>>>>
>>>>>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Milan
>>>>>>>>>
>&g
;>> Thanks,
>>>>>>>> Milan
>>>>>>>>
>>>>>>>> On Mon, 5 May 2025 at 21:25, Jia Yu wrote:
>>>>>>>>
>>>>>>>>> Thanks for putting this together.
>>>>&
gradle build? If not, how headers are
>>>>>> validated to include correct license?
>>>>>>
>>>>>> Thank you,
>>>>>>
>>>>>> Vlad
>>>>>>
>>>>>>
>>>>>>
>>&g
>
>>>>>
>>>>> > On May 4, 2025, at 5:38 PM, Dongjoon Hyun
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > +1
>>>>> >
>>>>> >
t;> Thank you,
>>>>
>>>> Vlad
>>>>
>>>>
>>>>
>>>> > On May 4, 2025, at 5:38 PM, Dongjoon Hyun
>>>> wrote:
>>>> >
>>>> >
>>>> >
>>>> > +1
>>>>
t;>>>>> Thanks for putting this together.
>>>>>>>>
>>>>>>>> +0 (non-binding) from my side. Happy to see geospatial data is
>>>>>>>> getting attention but we need to make it right.
>>>>>>>&g
t;>>>>>>
>>>>>>>> Thanks for putting this together.
>>>>>>>>
>>>>>>>> +0 (non-binding) from my side. Happy to see geospatial data is
>>>>>>>> getting attention but we need to make it r
>>>>>> On Mon, 5 May 2025 at 21:25, Jia Yu wrote:
>>>>>>
>>>>>>> Thanks for putting this together.
>>>>>>>
>>>>>>> +0 (non-binding) from my side. Happy to see geospatial data is
>>>>>>&g
;
>>>>>> +0 (non-binding) from my side. Happy to see geospatial data is
>>>>>> getting attention but we need to make it right.
>>>>>>
>>>>>>
>>>>>> Jia Yu
>>>>>>
>>>>>>
>>>
:
>>>>
>>>>> Thanks for putting this together.
>>>>>
>>>>> +0 (non-binding) from my side. Happy to see geospatial data is getting
>>>>> attention but we need to make it right.
>>>>>
>>>>>
>>>&
a is getting
>>>> attention but we need to make it right.
>>>>
>>>>
>>>> Jia Yu
>>>>
>>>>
>>>>
>>>> On Mon, May 5, 2025 at 12:15 PM Szehon Ho
>>>> wrote:
>>>>
>&
gt;>> >
>>> > +1
>>> >
>>> > I checked the checksum and signatures, and tested with K8s v1.32.
>>> >
>>> > Dongjoon.
>>> >
>>> > On 2025/05/04 23:58:54 Zhou Jiang wrote:
>>> >> +1 , thanks
>
>>> On Mon, May 5, 2025 at 12:15 PM Szehon Ho
>>> wrote:
>>>
>>>> +1 (non binding)
>>>>
>>>> Thanks
>>>> Szehon
>>>>
>>>> On Mon, May 5, 2025 at 11:17 AM DB Tsai wrote:
>>>>
>>>
side. Happy to see geospatial data is getting
>> attention but we need to make it right.
>>
>>
>> Jia Yu
>>
>>
>>
>> On Mon, May 5, 2025 at 12:15 PM Szehon Ho
>> wrote:
>>
>>> +1 (non binding)
>>>
>>> Thanks
>>&g
May 5, 2025 at 12:15 PM Szehon Ho wrote:
>
>> +1 (non binding)
>>
>> Thanks
>> Szehon
>>
>> On Mon, May 5, 2025 at 11:17 AM DB Tsai wrote:
>>
>>> +1, geospatial types will be a great feature for Spark. Thanks for
>>> working on it.
&
AM DB Tsai wrote:
>
>> +1, geospatial types will be a great feature for Spark. Thanks for
>> working on it.
>>
>> On May 5, 2025, at 11:04 AM, Menelaos Karavelas <
>> menelaos.karave...@gmail.com> wrote:
>>
>> I started the discussion on addin
+1 (non binding)
Thanks
Szehon
On Mon, May 5, 2025 at 11:17 AM DB Tsai wrote:
> +1, geospatial types will be a great feature for Spark. Thanks for working
> on it.
>
> On May 5, 2025, at 11:04 AM, Menelaos Karavelas <
> menelaos.karave...@gmail.com> wrote:
>
>
.h...@gmail.com> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark K8s
>>> Operator 0.1.0. This vote is open for the next 72 hours and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>&g
+1, it’s exciting to see Spark Connect Swift client, showcasing Spark Connect as a truly language-agnostic protocol, and also powering Swift users to use Spark!Sent from my iPhoneOn May 5, 2025, at 1:11 AM, Gabor Somogyi wrote:+1 (non-binding)GOn Mon, May 5, 2025 at 8:35 AM huaxin gao <huaxin
+1, geospatial types will be a great feature for Spark. Thanks for working on
it.
> On May 5, 2025, at 11:04 AM, Menelaos Karavelas
> wrote:
>
> I started the discussion on adding geospatial types to Spark on March 28th.
> Since then there has been some discussion in the dev m
I started the discussion on adding geospatial types to Spark on March 28th.
Since then there has been some discussion in the dev mailing list, as well as
in the SPIP doc.
At this point I would like to move to a formal vote on adding support for
geospatial types to Spark.
*Discussion thread
Not sure this counts as -1, but by cursory checking the code, I found that
the way the TLS connection is set up is not always working:
https://github.com/apache/spark-connect-swift/blob/v0.1.0-rc1/Sources/SparkConnect/DataFrame.swift#L276-L288
Shows that DataFrame operations explicitly set
+1 (non-binding)
G
On Mon, May 5, 2025 at 8:35 AM huaxin gao wrote:
> +1 Thanks Dongjoon.
>
> On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun wrote:
>
>> +1
>>
>> I checked the checksum and signatures, and tested with Apache Spark 4.0.0
>> RC4 on Swift 6.
un wrote:
>> >
>> >
>> >
>> > +1
>> >
>> > I checked the checksum and signatures, and tested with K8s v1.32.
>> >
>> > Dongjoon.
>> >
>> > On 2025/05/04 23:58:54 Zhou Jiang wrote:
>> >> +1 , thanks
4 23:58:54 Zhou Jiang wrote:
>> >> +1 , thanks for driving this release!
>> >>
>> >> *Zhou JIANG*
>> >>
>> >>
>> >>
>> >> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun > >> <mailto:dongjoon.h...@gma
+1 (non-binding)
Kazu
> On May 4, 2025, at 11:31 PM, huaxin gao wrote:
>
> +1 Thanks Dongjoon.
>
> On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun <mailto:dongj...@apache.org>> wrote:
>> +1
>>
>> I checked the checksum and signatures, and tested wi
gt; Dongjoon.
> >
> > On 2025/05/04 23:58:54 Zhou Jiang wrote:
> >> +1 , thanks for driving this release!
> >>
> >> *Zhou JIANG*
> >>
> >>
> >>
> >> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun
> wrote:
> >>
> >>
+1 Thanks Dongjoon.
On Sun, May 4, 2025 at 5:21 PM Dongjoon Hyun wrote:
> +1
>
> I checked the checksum and signatures, and tested with Apache Spark 4.0.0
> RC4 on Swift 6.1.
>
> This is the initial release (v0.1) with 105 patches to provide a tangible
> release to the use
+1
On Sun, May 4, 2025 at 3:15 PM Dongjoon Hyun wrote:
>
> Please vote on releasing the following candidate as Apache Spark Connect
> Swift Client 0.1.0. This vote is open for the next 72 hours and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
&
+1
On Sun, May 4, 2025 at 4:58 PM Dongjoon Hyun wrote:
>
> Please vote on releasing the following candidate as Apache Spark K8s Operator
> 0.1.0. This vote is open for the next 72 hours and passes if a majority +1
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [
> On Sun, May 4, 2025 at 16:58 Dongjoon Hyun wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark K8s
>>> Operator 0.1.0. This vote is open for the next 72 hours and passes if a
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 v
easing the following candidate as Apache Spark K8s
> > Operator 0.1.0. This vote is open for the next 72 hours and passes if a
> > majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark K8s Operator 0.1.0
> >
+1
I checked the checksum and signatures, and tested with Apache Spark 4.0.0 RC4
on Swift 6.1.
This is the initial release (v0.1) with 105 patches to provide a tangible
release to the users.
v0.2 is under planning in SPARK-51999.
Dongjoon.
On 2025/05/04 22:14:54 Dongjoon Hyun wrote
+1 , thanks for driving this release!
*Zhou JIANG*
On Sun, May 4, 2025 at 16:58 Dongjoon Hyun wrote:
> Please vote on releasing the following candidate as Apache Spark K8s
> Operator 0.1.0. This vote is open for the next 72 hours and passes if a
> majority +1 PMC votes are cas
Please vote on releasing the following candidate as Apache Spark K8s
Operator 0.1.0. This vote is open for the next 72 hours and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark K8s Operator 0.1.0
[ ] -1 Do not release this
Please vote on releasing the following candidate as Apache Spark Connect
Swift Client 0.1.0. This vote is open for the next 72 hours and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark Connect Swift Client 0.1.0
[ ] -1 Do not
Does the following options works for you?
./bin/spark-shell --conf spark.jars.ivy=${HOME}/.ivy2
./bin/spark-shell --conf spark.jars.ivy=/Users/yourname/.ivy2
I think the issue is that ~ is not interpreted by shell and just passthrough to
the Ivy lib.
Thanks,
Cheng Pan
> On Apr 29, 2025,
Hi Jacek,
Thanks for the confirmation! Let's change the wording first, and open a
JIRA ticket for the relative path support.
Wenchen
On Tue, Apr 29, 2025 at 2:41 AM Jacek Laskowski wrote:
> Hi Wenchen,
>
> Looks like it didn't work in 3.5 either.
>
> ❯ ./bin/spark-s
Hi Wenchen,
Looks like it didn't work in 3.5 either.
❯ ./bin/spark-shell --version
25/04/28 20:37:48 WARN Utils: Your hostname, Jaceks-Mac-mini.local resolves
to a loopback address: 127.0.0.1; using 192.168.68.100 instead (on
interface en1)
25/04/28 20:37:48 WARN Utils: Set SPARK_LOCAL_IP i
Hi Jacek,
Thanks for reporting the issue! Did you hit the same problem when you set
the `spark.jars.ivy` config with Spark 3.5? If this config never worked
with a relative path, we should change the wording in the migration guide.
Thanks,
Wenchen
On Sun, Apr 27, 2025 at 10:27 PM Jacek Laskowski
Hi,
I found in docs/core-migration-guide.md:
- Since Spark 4.0, Spark uses `~/.ivy2.5.2` as Ivy user directory by
default to isolate the existing systems from Apache Ivy's incompatibility.
To restore the legacy behavior, you can set `spark.jars.ivy` to `~/.ivy2`.
With that, I
One more small fix (on another topic) for the next RC:
https://github.com/apache/spark/pull/50685
Thanks!
Szehon
On Tue, Apr 22, 2025 at 10:07 AM Rozov, Vlad
wrote:
> Correct, to me it looks like a Spark bug
> https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to
> tr
Correct, to me it looks like a Spark bug
https://issues.apache.org/jira/browse/SPARK-51821 that may be hard to trigger
and is reproduce using the test case provided in
https://github.com/apache/spark/pull/50594:
1. Spark UninterruptibleThread “task” is interrupted by “test” thread while
“task
Correct me if I'm wrong: this is a long-standing Spark bug that is very
hard to trigger, but the new Parquet version happens to hit the trigger
condition and exposes the bug. If this is the case, I'm +1 to fix the Spark
bug instead of downgrading the Parquet version.
Let's mov
I don't think PARQUET-2432 has any issue itself. It looks to have triggered
a deadlock case like https://github.com/apache/spark/pull/50594.
I'd suggest that we fix forward if possible.
Thanks,
Manu
On Mon, Apr 21, 2025 at 11:19 PM Rozov, Vlad
wrote:
> The deadlock is reprodu
The deadlock is reproducible without Parquet. Please see
https://github.com/apache/spark/pull/50594.
Thank you,
Vlad
On Apr 21, 2025, at 1:59 AM, Cheng Pan wrote:
The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the
latest workable version is Parquet 1.13.1
The deadlock is introduced by PARQUET-2432(1.14.0), if we decide downgrade, the
latest workable version is Parquet 1.13.1.
Thanks,
Cheng Pan
> On Apr 21, 2025, at 16:53, Wenchen Fan wrote:
>
> +1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to
> https://github.com/
+1 to downgrade to Parquet 1.15.0 for Spark 4.0. According to
https://github.com/apache/spark/pull/50583#issuecomment-2815243571 , the
Parquet CVE does not affect Spark.
On Mon, Apr 21, 2025 at 2:45 PM Hyukjin Kwon wrote:
> That's nice but we need to wait for them to release, and upgra
It seems this patch(https://github.com/apache/parquet-java/pull/3196) can
avoid deadlock issue if using Parquet 1.15.1.
On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar
wrote:
> I found another bug introduced in 4.0 that breaks Spark connect client x
> server compatibility: https://gith
uet-java/pull/3196) can
> avoid deadlock issue if using Parquet 1.15.1.
>
> On Wed, Apr 16, 2025 at 5:39 PM Niranjan Jayakar
> wrote:
>
>> I found another bug introduced in 4.0 that breaks Spark connect client x
>> server compatibility: https://github.com/apache/spark/
I found another bug introduced in 4.0 that breaks Spark connect client x
server compatibility: https://github.com/apache/spark/pull/50604.
Once merged, this should be included in the next RC.
On Thu, Apr 10, 2025 at 5:21 PM Wenchen Fan wrote:
> Please vote on releasing the following candid
It may not be the Parquet introduced issue. It looks like a race condition
between Spark UninterruptibleThread and Hadoop/HDFS DFSOutputStream. I tried to
resolve the deadlock in https://github.com/apache/spark/pull/50594. Can you
give it a try? I will see if I can reproduce the deadlock in a
ava.base@17.0.6/Thread.java:833)
Found 1 deadlock.
On Mon, Apr 14, 2025 at 11:13 AM Hyukjin Kwon wrote:
> Made a fix at https://github.com/apache/spark/pull/50575 👍
>
> On Mon, 14 Apr 2025 at 11:42, Wenchen Fan wrote:
>
>> I'm testing the new spark-connect distribution
s working on, or are you still
> investigating it? If the issue is confirmed by the Parquet community, we
> can probably roll back to the previous Parquet version for Spark 4.0.
>
> Thanks,
> Wenchen
>
> On Tue, Apr 15, 2025 at 7:24 AM Yuming Wang wrote:
>
>> This rel
Hi Yuming,
1.51.1 is the latest release of Apache Parquet for the 1.x line. Is it a
known issue the Parquet community is working on, or are you still
investigating it? If the issue is confirmed by the Parquet community, we
can probably roll back to the previous Parquet version for Spark 4.0
Made a fix at https://github.com/apache/spark/pull/50575 👍
On Mon, 14 Apr 2025 at 11:42, Wenchen Fan wrote:
> I'm testing the new spark-connect distribution and here is the result:
>
> 4 packages are tested: pip install pyspark, pip install pyspark_connect (I
> installed
I'm testing the new spark-connect distribution and here is the result:
4 packages are tested: pip install pyspark, pip install pyspark_connect (I
installed them with the RC4 pyspark tarballs), the classic tarball
(spark-4.0.0-bin-hadoop3.tgz), the connect tarball
(spark-4.0.0-bin-hadoop3-
{
"emoji": "👍",
"version": 1
}
performance needs to be checked.With YARN and External Spark Shuffle, the sparkshuffle is a lot more optimized, so we can experience slowness with spark on k8s, especially if there is a pod restart. Have you checked Apache Uniffle / Celeborn for enabling spark shuffle ?
fyi .. i'm
Pls check if there are resource constraints on the pods/nodes especially if
they are shared.
MinIO connectivity performance needs to be checked.
With YARN and External Spark Shuffle, the sparkshuffle is a lot more
optimized, so we can experience slowness with spark on k8s, especially if
there is
Hello Karan,I am using Spark open source in kubernetes and Spark mapr bundle in YARN.For launching job in both approach it takes same 10 secs .For shuffle I am using local in both yarn and kubernetes.Sent from my iPhoneOn Apr 11, 2025, at 11:24 AM, karan alang wrote:Hi Prem,Which distribution of
Hi Prem,
Which distribution of Spark are you using ?
how long does it take to launch the job ?
wrt Spark Shuffle, what is the approach you are using - storing shuffle
data in MinIO or using host path ?
regds,
Karan
On Fri, Apr 11, 2025 at 4:58 AM Prem Sahoo wrote:
> Hello Team,
> I
Hello Team,
I have a peculiar case of Spark slowness.
I am using Minio as Object storage from where Spark reads & writes data. I
am using YARN as Master and executing a Spark job which takes ~5mins the
same job when run with Kubernetes as Master it takes ~8 mins .
I checked the Spark DAG in
Please vote on releasing the following candidate as Apache Spark version
4.0.0.
The vote is open until April 15 (PST) and passes if a majority +1 PMC votes
are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 4.0.0
[ ] -1 Do not release this package because ...
To
this proposal now
... 😂
*"If you haven’t encountered this kind of ‘dependency hell’ while working
on geospatial projects with Spark, you may have been fortunate to deal with
relatively simple cases."*
Yes, that was the case for us. We loaded OpenStreetMap data from Spain,
calculated some Have
I've noticed that the check is set in *scalastyle-config.xml*:
true
Given this configuration, how is it possible that some people have been
able to commit changes violating this rule? Moreover, how were these
changes even merged despite failing this validation? It seems like
you,
Vlad
On Mar 26, 2025, at 3:18 PM, Hyukjin Kwon wrote:
That only fixes Maven. Both SBT build and Maven build should work in the same
or similar wat. Let's make sure both work.
On Thu, Mar 27, 2025 at 3:18 AM Rozov, Vlad wrote:
Please see https://github.com/vrozov/spark/tree/spark-she
#Options_to_Tune
But you need that underlying hosting infra to be the same before making
comparisons about the layers above. Why not start by either replicating
your previous setup in k8s or running spark 3.5 standalone outside k8s and
comparing it to spark 3.2 in the same environment?
On Tue, 25 Mar 2025 at
Hi Wenchen,
Could you please wait for https://github.com/apache/spark/pull/50246 to be
merged before you cut the next RC?
Thanks,
Huaxin
On Mon, Mar 31, 2025 at 8:53 PM Wenchen Fan wrote:
> Hi all,
>
> Thanks for your feedback! Regarding
> https://github.com/apache/spark/pull/501
Hi all,
Thanks for your feedback! Regarding
https://github.com/apache/spark/pull/50187 , I don't think it's a 4.0
blocker as it's a CI issue for the examples module. Other than that, all
other issues have been resolved and I'll cut the next RC after
https://github.com/apache
believe it’s important to standardize common data types in Spark and
clearly define the boundaries between different layers in the Lakehouse
ecosystem.
While it makes sense for Apache Sedona to have its own Parquet data source
for geospatial types in the absence of a standard, the long-term goal
. I'd be curious
about what those numbers are -though they only measure task/job commit, not
all the work (that's not quite true, but...)
You can get a log of all S3 IO performed for an entire Spark job across all
worker threads, via the S3 auditing,
https://hadoop.apache.org/docs/stable/
again for the expertise from Sedona side in these
efforts.
Thanks!
Szehon
Sent from my iPhone
> On Mar 29, 2025, at 11:42 PM, Jia Yu wrote:
>
> Hi Reynold and team,
>
> I’m glad to see that the Spark community is recognizing the importance
> of geospatial support. The Se
Hey Angel,
I am glad that you asked these questions. Please see my answers below.
*1. Domain types evolve quickly. - It has taken years for Parquet to
include these new types in its format... We could evolve alongside Parquet.
Unfortunately, Spark is not known for upgrading its dependencies
Hi Reynold and team,
I’m glad to see that the Spark community is recognizing the importance
of geospatial support. The Sedona community has long been a strong
advocate for Spark, and we’ve proudly supported large-scale geospatial
workloads on Spark for nearly a decade. We’re absolutely open to
* 1. Domain types evolve quickly.*
It has taken years for Parquet to include these new types in its format...
We could evolve alongside Parquet. Unfortunately, Spark is not known for
upgrading its dependencies quickly.
* 2. Geospatial in Java and Python is a dependency hell.*
How has
While I don’t think Spark should become a super specialized geospatial
processing engine, I don’t think it makes sense to focus *only* on reading
and writing from storage. Geospatial is a pretty common and fundamental
capability of analytics systems and virtually every mature and popular
analytics
Sedona community.
Since the primary motivation here is Parquet-level support, I suggest shifting
the focus of this discussion toward enabling geo support in Spark Parquet
DataSource rather than introducing core types.
** Why Spark Should Avoid Hardcoding Domain-Specific Types like geo types
minimal support in Spark, as a common platform, for these types.
To be more specific and explicit: The proposal scope is to add support for
reading/writing to Parquet, based on the new standard, as well as adding the
types as built-in types in Spark to complement the storage support. The few ST
, now that the types are in most common data sources in ecosystem , I think Apache Spark as a common platform needs to have this type definition for inter-op, otherwise users of vanilla Spark cannot work with those data sources with stored geospatial data. (Imo a similar rationale in adding timestamp
Hello Jia,
Wenchen summarized the intent very clearly. The scope of the proposal is
primarily the type system and storage, not processing. Let’s work together on
the technical details and make sure the work we propose to do in Spark works
best with Apache Sedona.
Best,
Menelaos
> On Mar
Hi Jia,
This is a good question. As the shepherd of this SPIP, I'd like to clarify
the motivation here: the focus of this project is more about the storage
part, not the processing. Apache Sedona is a great library for geo
processing, but without native geo type support in Spark, users can
>> /WKB
>> <https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary>
>> ?
>>
>> El vie, 28 mar 2025 a las 20:50, Ángel Álvarez Pascua (<
>> angel.alvarez.pas...@gmail.com>) escribió:
>>
>>> +1 (non-bindin
+1 (non-binding)
El vie, 28 mar 2025, 18:48, Menelaos Karavelas
escribió:
> Dear Spark community,
>
> I would like to propose the addition of new geospatial data types
> (GEOMETRY and GEOGRAPHY) which represent geospatial values as recently
> added as new logical types
1 - 100 of 3975 matches
Mail list logo