Re: Any committers interested in reviewing SPARK-52023

2025-06-28 Thread Deepak Sharma
I can review though I am not a committer yet On Sat, 28 Jun 2025 at 17:48, Emil Ejbyfeldt wrote: > Create this MR https://github.com/apache/spark/pull/50827 that fixes a > segfault/data corruption issue when using a udaf returning an option. > > Anyone committer willing to have a lo

Any committers interested in reviewing SPARK-52023

2025-06-28 Thread Emil Ejbyfeldt
Create this MR https://github.com/apache/spark/pull/50827 that fixes a segfault/data corruption issue when using a udaf returning an option. Anyone committer willing to have a look? Best, Emil

Introducing Debo – Lightweight Unified Platform with Apache Spark Integration (Feedback Welcome!)

2025-06-26 Thread Surafel Temesgen
Dear Apache Spark Community, My name is *Surafel Temesgen*, and I’m excited to introduce a project I’ve been developing called *Debo* — a lightweight, unified infrastructure management platform designed to monitor and control key components of the Hadoop and big data ecosystem, including *Apache

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-24 Thread Sakthi
gt;> >>>>>>>> Vlad >> >>>>>>>> >> >>>>>>>> On Jun 1, 2025, at 7:21 PM, Wenchen Fan >> wrote: >> >>>>>>>> >> >>>>>>>> +1 >>

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-24 Thread Kousuke Saruta
t; >> >>>>>>>> > >> >>>>>>>> Vlad > >> >>>>>>>> > >> >>>>>>>> On Jun 1, 2025, at 7:21 PM, Wenchen Fan > >> wrote: > >> >>>>>>>>

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-22 Thread Hyukjin Kwon
>>>>>>>> On Mon, Jun 2, 2025 at 9:55 AM Yuanjian Li < > xyliyuanj...@gmail.com > >>>>>>> <mailto:xyliyuanj...@gmail.com>> wrote: > >>>>>>>> +1 > >>>>>>>> > >>>>>>>&g

[VOTE][RESULT] Release Apache Spark Connect Go Client 0.1.0

2025-06-14 Thread Martin Grund
gt;> >>> +1 (non-binding) >>> >>> On Mon, Jun 9, 2025 at 8:28 PM bo yang wrote: >>> >>>> +1 (non-binding), thanks Martin! >>>> >>>> On Mon, Jun 9, 2025 at 7:47 PM Cheng Pan wrote: >>>> >>>>> +1 (non-

Re: [PR] feat: merge `spark-connect-rs` with apache project [spark-connect-rust]

2025-06-12 Thread via GitHub
xuanyuanking commented on PR #1: URL: https://github.com/apache/spark-connect-rust/pull/1#issuecomment-2965713362 Also cc’ing @andygrove for visibility and to provide guidance or suggestions related to compliance. Thank you, Andy! -- This is an automated message from the Apache Git

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Wenchen Fan
I verified: >>>> >>>> 1. LICENSE/NOTICE are present >>>> 2. Signatures is correct >>>> 3. Build source code and run UT (I have to replace sparksrc folder with >>>> the content of spark-4.0.0.tgz to make the source happen) >>>> >

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Herman van Hovell
rtin! >> >> On Mon, Jun 9, 2025 at 7:47 PM Cheng Pan wrote: >> >>> +1 (non-binding) >>> >>> I verified: >>> >>> 1. LICENSE/NOTICE are present >>> 2. Signatures is correct >>> 3. Build source code and run UT (I have

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Rozov, Vlad
ote: +1 (non-binding) I verified: 1. LICENSE/NOTICE are present 2. Signatures is correct 3. Build source code and run UT (I have to replace sparksrc folder with the content of spark-4.0.0.tgz to make the source happen) Thanks, Cheng Pan On Jun 10, 2025, at 00:59, Martin Grund mailto:mar...@da

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Sakthi
is correct >> 3. Build source code and run UT (I have to replace sparksrc folder with >> the content of spark-4.0.0.tgz to make the source happen) >> >> Thanks, >> Cheng Pan >> >> >> >> On Jun 10, 2025, at 00:59, Martin Grund wrote: >> &g

[PR] feat: merge `spark-connect-rs` with apache project [spark-connect-rust]

2025-06-10 Thread via GitHub
sjrusso8 opened a new pull request, #1: URL: https://github.com/apache/spark-connect-rust/pull/1 # Description Merge the code from [spark-connect-rs](https://github.com/sjrusso8/spark-connect-rs) Addresses: [SPARK-52429](https://issues.apache.org/jira/browse/SPARK-52429

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread bo yang
+1 (non-binding), thanks Martin! On Mon, Jun 9, 2025 at 7:47 PM Cheng Pan wrote: > +1 (non-binding) > > I verified: > > 1. LICENSE/NOTICE are present > 2. Signatures is correct > 3. Build source code and run UT (I have to replace sparksrc folder with > the content of spa

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Cheng Pan
+1 (non-binding) I verified: 1. LICENSE/NOTICE are present 2. Signatures is correct 3. Build source code and run UT (I have to replace sparksrc folder with the content of spark-4.0.0.tgz to make the source happen) Thanks, Cheng Pan > On Jun 10, 2025, at 00:59, Martin Grund wrote: >

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
Hi folks, Please vote on releasing the following candidate as Apache Spark Connect Go Client 0.1.0. The release candidate was tested and built against Spark 4.0.0. The repository contains a sample application for submitting jobs written in Go using a small JVM wrapper <https://github.com/apa

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-06-09 Thread Hyukjin Kwon
These RC artifacts were dropped properly. On Mon, 9 Jun 2025 at 07:09, Hyukjin Kwon wrote: > This is an automated vote. Please ignore it. > > On Mon, Jun 9, 2025 at 6:46 AM wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 3.5.7

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
> version as you claim this release candidates was built and tested against > Spark 4.0.0. > > 2. Seems your public key was not added to KEYS, so I can not verify your > signature. > > $ wget https://downloads.apache.org/spark/KEYS > $ gpg --import KEYS > $ gpg --verify spark-conn

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Cheng Pan
Hi Martin, Thanks for addressing it, a few questions/issues I found: 1. The "fun Version"[1] returns "3.5.x”, this does not look like a correct version as you claim this release candidates was built and tested against Spark 4.0.0. 2. Seems your public key was not added to KEY

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-06-09 Thread Hyukjin Kwon
This is an automated vote. Please ignore it. On Mon, Jun 9, 2025 at 6:46 AM wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.5.7. > > The vote is open until Fri, 13 Jun 2025 06:32:20 PDT and passes if a > majority +1 PMC votes are cast, with &

[VOTE] Release Spark 3.5.7 (RC1)

2025-06-09 Thread gurwls223
Please vote on releasing the following candidate as Apache Spark version 3.5.7. The vote is open until Fri, 13 Jun 2025 06:32:20 PDT and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.5.7 [ ] -1 Do not release this package

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
I updated the release based on the tag with the source releases and the proper signature. https://github.com/apache/spark-connect-go/releases/tag/v0.1.0-rc1 On Sun, Jun 8, 2025 at 10:44 PM Cheng Pan wrote: > The release artifacts don’t satisfy the ASF release policy[1]. > > > P

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-09 Thread Jungtaek Lim
It's a bit different for users leveraging LevelDB - since it requires opt-in, they are willing to use it if they still use it, hence they are likely to retain the config during the upgrade. >From the initial post, there is a claim that we deprecated LevelDB in Apache Spark 4.0.0. Shall I

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-08 Thread Yang Jie
I would like to provide some new information: 1. Spark 3.4.0 [SPARK-42277] has started using RocksDB as the default option for `spark.history.store.hybridStore.diskBackend`. - Since Spark 3.4, Spark will use RocksDB store if `spark.history.store.hybridStore.enabled` is true. To restore the

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Cheng Pan
eng Pan > On Jun 9, 2025, at 12:21, Martin Grund wrote: > > Please vote on releasing the following candidate as Apache Spark Connect Go > Client 0.1.0. > > The release candidate was tested and built against Spark 4.0.0. The > repository contains a sample application fo

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Yuanjian Li
rote: > >> Please vote on releasing the following candidate as Apache Spark Connect >> Go Client 0.1.0. >> >> The release candidate was tested and built against Spark 4.0.0. The >> repository contains a sample application for submitting jobs written in Go >> using a sma

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Jules Damji
+1 (non-binding? —Sent from my iPhonePardon the dumb thumb typos :)On Jun 8, 2025, at 9:32 PM, Hyukjin Kwon wrote:+1On Sun, Jun 8, 2025 at 9:22 PM Martin Grund wrote:Please vote on releasing the following candidate as Apache Spark Connect Go Client 0.1.0. The release candidate was tested and

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Hyukjin Kwon
+1 On Sun, Jun 8, 2025 at 9:22 PM Martin Grund wrote: > Please vote on releasing the following candidate as Apache Spark Connect > Go Client 0.1.0. > > The release candidate was tested and built against Spark 4.0.0. The > repository contains a sample application for submitting

[VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Martin Grund
Please vote on releasing the following candidate as Apache Spark Connect Go Client 0.1.0. The release candidate was tested and built against Spark 4.0.0. The repository contains a sample application for submitting jobs written in Go using a small JVM wrapper <https://github.com/apache/sp

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-08 Thread Jungtaek Lim
DB has native memory leak issues, at least for the SHS use case, I > need to reboot the SHS for every two months to recover it, issue gone after > upgrading to Spark 3.3 and switching to RocksDB. > > Scale and Performance: we keep ~800k applications event logs for the event > log HDF

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-07 Thread Jules Damji
gt;>>>> <mailto:xyliyuanj...@gmail.com>> wrote: >>>>>>>> +1 >>>>>>>> >>>>>>>> On Sun, Jun 1, 2025 at 18:30 DB Tsai >>>>>> dbt...@dbtsai.com>> wrote: >>>>>>>>

Re: Question Regarding Spark Dependencies in Scala

2025-06-06 Thread Ángel Álvarez Pascua
mpatible) if bugs or vulnerabilities are detected, for example. El vie, 6 jun 2025, 10:09, Sem escribió: > > I may not need anything from spark but if I'll declare a dependency in > Jackson or guava with a different version than spark already use and > package- I might break things... &

Re: Question Regarding Spark Dependencies in Scala

2025-06-06 Thread Sem
> I may not need anything from spark but if I'll declare a dependency in Jackson or guava with a different version than spark already use and package- I might break things... In that case I would recommend you to use assembly / assemblyShadeRules for sbt-assembly or maven-shade-plugin f

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-06 Thread Cheng Pan
SHS for every two months to recover it, issue gone after upgrading to Spark 3.3 and switching to RocksDB. Scale and Performance: we keep ~800k applications event logs for the event log HDFS directory, multiple threads re-parsing to rebuild listing.rdb takes ~15mins. Thanks, Cheng Pan >

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-06 Thread Jungtaek Lim
released its last > version 12 years ago, and its code repository was last updated 8 years ago. > Consequently, I believe it's challenging for us to receive ongoing > maintenance and support from this project. > >> > >> On the flip side, when developers implement new f

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-05 Thread Jia Fan
version >> 12 years ago, and its code repository was last updated 8 years ago. >> Consequently, I believe it's challenging for us to receive ongoing >> maintenance and support from this project. >> >> On the flip side, when developers implement n

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-04 Thread Boumalhab, Chris
Hi Lee, Thanks for the info. I'm familiar with theta and tuple sketches' implementations under the hood. Will let you know if I have any questions! Thank you for all the work your team does. Chris

[ANNOUNCE] Apache Spark Kubernetes Operator 0.3.0 released

2025-06-04 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache Spark Kubernetes Operator 0.3.0! - Notable Changes * Built and tested with Apache Spark 4.0 and Spark Connect Swift Client * Running on Java 24 * Promoting CRDs to v1beta1 from v1alpha1 - Website * https://s.apache.org/spark

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-04 Thread Lee Rhodes
Hi, This is Lee Rhodes (lee...@apache.org) from the Apache DataSketches team. I am pleased that there is interest in the Spark community for integrating our library more tightly into Spark! I would like to help if I can. Unfortunately, I am not Spark fluent so I'm not going to be very u

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-04 Thread Mich Talebzadeh
t;> On Jun 2, 2025, at 12:50 PM, DB Tsai >> wrote: >> >>>>>>>>>> >> >>>>>>>>>> +1 looking forward to seeing real-time mode. >> >>>>>>>>>> Sent from my iPhone >> >>>>>>&

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-04 Thread Jerry Peng
gt;>> On Jun 2, 2025, at 12:50 PM, DB Tsai wrote: > >>>>>>>>>> > >>>>>>>>>> +1 looking forward to seeing real-time mode. > >>>>>>>>>> Sent from my iPhone > >>>>>>>>

[VOTE][RESULT] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-04 Thread L. C. Hsieh
The vote passes with 24 +1s (12 binding +1s), 1 +0s (1 binding +0s) and no -1s. Thanks to all who helped with the vote! (* = binding) +1: Dongjoon Hyun (*) Yuanjian Li (*) Tathagata Das (*) Huaxin Gao (*) Xiao Li (*) DB Tsai (*) L.C. Hsieh (*) Denny Lee Gengliang Wang (*) Sakthi Anish Shrigondeka

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-04 Thread L. C. Hsieh
;>>>> >>>>>>>>>> +1 looking forward to seeing real-time mode. >>>>>>>>>> Sent from my iPhone >>>>>>>>>> >>>>>>>>>> On Jun 1, 2025, at 9:47 PM, Xiao Li wrote: >>>>

[VOTE][RESULT] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-04 Thread Dongjoon Hyun
The vote passes with 12 +1s (7 binding +1s). Thanks to all who helped with the release! (* = binding) +1: - Zhou Jiang - Huaxin Gao * - Liang-Chi Hsieh * - Peter Toth - Yang Jie * - DB Tsai * - Wenchen Fan * - Liu Cao - Vlad Rozov - Dongjoon Hyun * - Jungtaek Lim - Xinrong Meng * +0: None -1: No

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-04 Thread Dongjoon Hyun
.org>> wrote: > >> > > > >> > > +1 > >> > > > >> > >> On 2025/06/01 08:10:10 Peter Toth wrote: > >> > >> +1 > >> > >> > >> > >>> On Sun, Jun 1, 2025 at 9:01 AM L. C. Hsieh >

[VOTE][RESULT] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-04 Thread Dongjoon Hyun
The vote passes with 15 +1s (9 binding +1s). Thanks to all who helped with the release. (* = binding) +1: - Zhou Jiang - Huaxin Gao * - Liang-Chi Hsieh * - Peter Toth - Yang Jie * - DB Tsai * - Yuanjian Li * - Wenchen Fan * - Vlad Rozov - Dongjoon Hyun * - Sandy Ryza - Denny Lee - Jungtaek Lim - X

[ANNOUNCE] Apache Spark Connect Swift Client 0.3.0 released

2025-06-04 Thread Dongjoon Hyun
Hi All. We are happy to announce the availability of Apache Spark Connect Swift Client 0.3.0! This is the first release tested with the official Apache Spark 4.0.0. Website - https://apache.github.io/spark-connect-swift/ Release Note - https://github.com/apache/spark-connect-swift/releases

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-04 Thread Dongjoon Hyun
<mailto:xyliyuanj...@gmail.com>> wrote: > >>>>> > +1 > >>>>> > > >>>>> > On Sun, Jun 1, 2025 at 18:30 DB Tsai >>>>> dbt...@dbtsai.com>> wrote: > >>>>> > +1 > >>>>> >

Re: Question Regarding Spark Dependencies in Scala

2025-06-04 Thread Nimrod Ofek
creating a library for Delta that helps track the lag in structured streaming delta to delta tables streams - I may not need anything from spark but if I'll declare a dependency in Jackson or guava with a different version than spark already use and package- I might break things... Because I&#x

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
This looks good to me! I’m considering tuple too if we have theta. Theta can be priority, but given that tuple is just an extension, it doesn’t hurt to add down the line.

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
bject: RE: [EXTERNAL] [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Chris, We integrated DataSketches into Spar

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
Following what Ryan did for HLL sketches, I would also add an aggregate expression for unions as the aggregate version of the binary union expression. The expressions that Ryan added are: hll_sketch_agg hll_union hll_union_agg hll_sketch_estimate Following the same naming convention I would prob

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
I think something like this could work: theta_sketch_agg(col) to build the sketch theta_sketch_union(sketch1, sketch2) to union the sketches theta_sketch_estimate(sketch) or theta_sketch_estimate_count(sketch) to estimate count … Something similar can be done for tuple support. Let me know what

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
m: Menelaos Karavelas <mailto:menelaos.karave...@gmail.com>> > Date: Tuesday, June 3, 2025 at 6:15 PM > To: "Boumalhab, Chris" <mailto:cboum...@amazon.com.INVALID>> > Cc: "dev@spark.apache.org <mailto:dev@spark.apache.org>" > mailto:dev@spark.apa

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
Yes, you're just saying that if your app depends on Foo, and Spark depends on Foo, then ideally you depend on the exact same version Spark uses. Otherwise it's up to Maven/SBT to pick one or the other version, which might or might not be suitable. Yes, dependency conflicts are painful to

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
, 2025 at 6:15 PM To: "Boumalhab, Chris" Cc: "dev@spark.apache.org" Subject: RE: [EXTERNAL] [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you ca

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
You don't add dependencies you don't use- but you do need to declare dependencies you do use, and if the platform you are running use a specific version you need to use that version- you can't break comparability. Since spark uses a lot of dependencies - I don't expect the us

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
Hello Chris. HLL sketches from the same project (Apache DataSketches) have already been integrated in Spark. How does your proposal fit given what I just mentioned? - Menelaos > On Jun 3, 2025, at 2:52 PM, Boumalhab, Chris > wrote: > > Hi all, > > I’d like to start

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Ryan Berti
Hi Chris, We integrated DataSketches into Spark when we introduced the hll_sketch_* UDFs - see the PR from 2023 <https://github.com/apache/spark/pull/40615> for more info. I'm sure there'd be interest in exposing other types of sketches, and I bet there'd be some potential f

[DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
Hi all, I’d like to start a discussion about adding support for [Apache DataSketches](https://datasketches.apache.org/) — specifically, Theta and Tuple Sketches — to Spark SQL and DataFrame APIs. ## Motivation These sketches allow scalable approximate set operations (like distinct count

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
I'll five an example: If I have a project that reads from Kafka topic avro messages - and writes them to Delta tables, I would expect to set only: libraryDependencies ++= Seq( "io.delta" %% "delta-spark" % deltaVersion % Provided, "org.apache.spark&q

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
Do you have an example of what you mean? Yes, a deployment of Spark has all the modules. You do not need to (should not in fact) deploy Spark code with your Spark app for this reason. You still need to express dependencies on the Spark code that your app uses at *compile* time however, in order

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
It does not compile if I don't add spark -sql. In usual projects I'd agree with you, but since Spark comes complete with all dependencies unlike other programs where you deploy certain dependencies only- I see no reason for users to select specific dependencies that are already bund

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
 PM Nimrod Ofek wrote: > Thanks Sean. > There are other dependencies that you need to align with Spark if you need > to use them as well - like Guava, Jackson etc. > I find them more difficult to use - because you need to go to Spark repo > to check the correct version used -

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
I think Spark, like any project, is large enough to decompose into modules, and it has been. A single app almost surely doesn't need all the modules. So yes you have to depend on the modules you actually need, and I think that's normal. See Jackson for example. (spark-sql is not necessa

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
Thanks Sean. There are other dependencies that you need to align with Spark if you need to use them as well - like Guava, Jackson etc. I find them more difficult to use - because you need to go to Spark repo to check the correct version used - and if there are upgrades between versions you need to

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
I think this is already how it works. Most apps would depend on just spark-sql (which depends on spark-core, IIRC). Maybe some optionally pull in streaming or mllib. I don't think it's intended that you pull in all submodules for any one app, although you could. I don't know if ther

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
Hi all, Sorry for bumping this again - just trying to understand if it's worth adding a small feature for this - I think it can help Spark users and Spark libraries upgrade and support Spark versions a lot easier :) If instead of adding many provided dependencies we'll have one that wi

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Cheng Pan
+1 (non-binding) Thanks, Cheng Pan > On Jun 2, 2025, at 03:00, L. C. Hsieh wrote: > > Hi all, > > I would like to start a vote on the new real-time mode in Apache Spark > Structured Streaming. > > Discussion thread: > https://lists.apache.org/thread/ovmfbzfkc3t9

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Wenchen Fan
gt;> >>>>>>>>>  >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> huaxin gao 于2025年6月1日周日 20:00写道: >>>>>>>>> >>>>>>>>>> +1 >>>>>>>>>> &

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Reynold Xin
gt;>>>>> +1 >>>>>>>> >>>>>>>> On Sun, Jun 1, 2025 at 7:50 PM Tathagata Das < >>>>>>>> tathagata.das1...@gmail.com> wrote: >>>>>>>> >>>>>>>>> +1

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Kent Yao
t;>> >>>>>>> +1 >>>>>>> >>>>>>> On Sun, Jun 1, 2025 at 7:50 PM Tathagata Das < >>>>>>> tathagata.das1...@gmail.com> wrote: >>>>>>> >>>>>>>> +1 (binding) >>>&

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread bo yang
1, 2025, at 9:47 PM, Xiao Li wrote: >>>>>>>> >>>>>>>>  >>>>>>>> +1 >>>>>>>> >>>>>>>> huaxin gao 于2025年6月1日周日 20:00写道: >>>>>>>> >>>>>>>>> +1 >&

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Reynold Xin
.@gmail.com>> wrote: >>>>> > +1 >>>>> > >>>>> > On Sun, Jun 1, 2025 at 18:30 DB Tsai >>>> dbt...@dbtsai.com>> wrote: >>>>> > +1 >>>>> > >>>>> > Sent from my iPhone >>&

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-02 Thread Xinrong Meng
t; wrote: >> > > >> > > +1 >> > > >> > >> On 2025/06/01 08:10:10 Peter Toth wrote: >> > >> +1 >> > >> >> > >>> On Sun, Jun 1, 2025 at 9:01 AM L. C. Hsieh > <mailto:vii...@gmail.com>> wrot

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Xinrong Meng
; >>>> > Sent from my iPhone >>>> > >>>> > > On Jun 1, 2025, at 2:32 AM, Yang Jie >>> yangji...@apache.org>> wrote: >>>> > > >>>> > > +1 >>>> > > >>>> > >> On

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Jungtaek Lim
sai.com>> wrote: >>> > +1 >>> > >>> > Sent from my iPhone >>> > >>> > > On Jun 1, 2025, at 2:32 AM, Yang Jie >> yangji...@apache.org>> wrote: >>> > > >>> > > +1 >>> > > >

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-02 Thread Jungtaek Lim
at 9:01 AM L. C. Hsieh vii...@gmail.com>> wrote: > > >>> > > >>> +1 > > >>> > > >>> On Sat, May 31, 2025 at 10:38 PM Dongjoon Hyun < > dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> > > >>> wro

RE: Inquiry: Best Practices for Replacing Snappy with LZ4/LZF Compression Across Spark Codebase (including test cases)

2025-06-02 Thread Balaji Sudharsanam V
Missed to mention , we are exploring this in Spark 4.0. Be it a configuration change or explicit code changes, throughout. We are keen to accommodate the recommended and the future proof solution approach. Any guidance, insights, or pointers to relevant documentation, JIRAs, or previous

Inquiry: Best Practices for Replacing Snappy with LZ4/LZF Compression Across Spark Codebase (including test cases)

2025-06-02 Thread Balaji Sudharsanam V
Dear Spark Developer Community, I hope this email finds you well. My name is Balaji, and I am a Software Engineer working with Apache Spark in IBM Z Systems (z/OS). We are exploring a scenario where we would like to move away from using the Snappy compression library within our Spark

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Denny Lee
ote: >> > > >> > > +1 >> > > >> > >> On 2025/06/01 08:09:38 Peter Toth wrote: >> > >> +1 >> > >> >> > >>> On Sun, Jun 1, 2025 at 9:00 AM L. C. Hsieh > <mailto:vii...@gmail.com>> wrote: &g

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Rozov, Vlad
>>> +1 >>> >>> On Sat, May 31, 2025 at 9:18 PM Dongjoon Hyun >>> mailto:dongjoon.h...@gmail.com>> >>> wrote: >>>> >>>> Please vote on releasing the following candidate as Apache Spark Connect >>> Swift Client 0.3.0. This

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Sandy Ryza
: >>>>>> >>>>>>> +1 (binding) >>>>>>> super excited about this! >>>>>>> >>>>>>> On Sun, Jun 1, 2025 at 10:45 PM Yuanjian Li >>>>>>> wrote: >>>>>>

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Sandy Ryza
5 at 9:00 AM L. C. Hsieh vii...@gmail.com>> wrote: > > >>> > > >>> +1 > > >>> > > >>> On Sat, May 31, 2025 at 9:18 PM Dongjoon Hyun < > dongjoon.h...@gmail.com<mailto:dongjoon.h...@gmail.com>> > > >>>

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Chao Sun
2025 at 7:50 PM Tathagata Das < >>>>> tathagata.das1...@gmail.com> wrote: >>>>> >>>>>> +1 (binding) >>>>>> super excited about this! >>>>>> >>>>>> On Sun, Jun 1, 2025 at 10:45

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Rozov, Vlad
.com>> wrote: +1 Dongjoon On Sun, Jun 1, 2025 at 12:02 L. C. Hsieh mailto:vii...@gmail.com>> wrote: Hi all, I would like to start a vote on the new real-time mode in Apache Spark Structured Streaming. Discussion thread: https://lists.apache.org/thread/ovmfbzfkc3t9odvv5gs7

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Jungtaek Lim
ut this! >>>>> >>>>> On Sun, Jun 1, 2025 at 10:45 PM Yuanjian Li >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> On Sun, Jun 1, 2025 at 19:00 Dongjoon Hyun >>>>>> wr

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Dongjoon Hyun
t;> On 2025/06/01 08:09:38 Peter Toth wrote: > >> +1 > >> > >>> On Sun, Jun 1, 2025 at 9:00 AM L. C. Hsieh > >>> mailto:vii...@gmail.com>> wrote: > >>> > >>> +1 > >>> > >>> On Sat, May 31, 2025 at 9:18 P

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-02 Thread Dongjoon Hyun
> > > >> On 2025/06/01 08:10:10 Peter Toth wrote: > >> +1 > >> > >>> On Sun, Jun 1, 2025 at 9:01 AM L. C. Hsieh > >>> mailto:vii...@gmail.com>> wrote: > >>> > >>> +1 > >>> > >>> On Sat, May 31,

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Wenchen Fan
On Sun, Jun 1, 2025 at 19:00 Dongjoon Hyun >>>>> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> Dongjoon >>>>>> >>>>>> >>>>>> On Sun, Jun 1, 2025 at 12:02 L. C. Hsieh wrote

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-02 Thread Rozov, Vlad
>>> +1 >>> >>> On Sat, May 31, 2025 at 10:38 PM Dongjoon Hyun >>> mailto:dongjoon.h...@gmail.com>> >>> wrote: >>>> >>>> Please vote on releasing the following candidate as Apache Spark K8s >>> Operator 0.3.0. T

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Peter Toth
wrote: >>> >>>> +1 >>>> >>>> On Sun, Jun 1, 2025 at 19:00 Dongjoon Hyun >>>> wrote: >>>> >>>>> +1 >>>>> >>>>> Dongjoon >>>>> >>>>&g

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread xianjin
com> wrote:Hi all, I would like to start a vote on the new real-time mode in Apache Spark Structured Streaming. Discussion thread: https://lists.apache.org/thread/ovmfbzfkc3t9odvv5gs75fhpvdckn90f SPIP: https://docs.google.com/document/d/1CvJvtlTGP6TwQIT4kW6GFT1JbdziAYOBvt60ybb7Dw8/edit?tab=t

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Mich Talebzadeh
I am Ok with +1. Having said that there is a merit IMO to add a matrix highlighting the differences between real time and Continuous Processing (Continuous Mode) to SPIP. Unless the assumption is that spark has abandoned the Continuous Mode) altogether *Feature Real-time Processing

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Mark Hamstra
> >> >>>>>> >> +1 >>>>>> >> >>>>>> >> On Sun, Jun 1, 2025 at 7:50 PM Tathagata Das >>>>>> >> wrote: >>>>>> >>> >>>>>> >>> +1 (binding

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Liu Cao
>>> >> >>>>> >> On Sun, Jun 1, 2025 at 7:50 PM Tathagata Das < >>>>> tathagata.das1...@gmail.com> wrote: >>>>> >>> >>>>> >>> +1 (binding) >>>>> >>> super excite

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-01 Thread Anish Shrigondekar
+1 (binding) >>>> >>> super excited about this! >>>> >>> >>>> >>> On Sun, Jun 1, 2025 at 10:45 PM Yuanjian Li >>>> wrote: >>>> >>>> >>>> >>>> +1 >>>> >>>>

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-01 Thread Sakthi
gt; >>> >>> +1 (binding) >>> >>> super excited about this! >>> >>> >>> >>> On Sun, Jun 1, 2025 at 10:45 PM Yuanjian Li >>> wrote: >>> >>>> >>> >>>> +1 >>> >>>> >>>

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-01 Thread Gengliang Wang
; >>>> >> >>>> +1 >> >>>> >> >>>> On Sun, Jun 1, 2025 at 19:00 Dongjoon Hyun >> wrote: >> >>>>> >> >>>>> +1 >> >>>>> >> >

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-01 Thread L. C. Hsieh
25 at 19:00 Dongjoon Hyun wrote: >>>>> >>>>> +1 >>>>> >>>>> Dongjoon >>>>> >>>>> >>>>> On Sun, Jun 1, 2025 at 12:02 L. C. Hsieh wrote: >>>>>> >>>&

  1   2   3   4   5   6   7   8   9   10   >