Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-24 Thread Sakthi
+1 (non-binding) On Sun, Jun 22, 2025 at 4:52 PM Hyukjin Kwon wrote: > +1 > > On Sun, 8 Jun 2025 at 02:12, Jules Damji wrote: > >> + 1 (non-binding) >> — >> Sent from my iPhone >> Pardon the dumb thumb typos :) >> >> > On Jun 4, 2025, at 8:10 AM, Dongjoon Hyun wrote: >> > >> > Thank you all.

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-24 Thread Kousuke Saruta
+1 On 2025/06/24 06:56:57 Sakthi wrote: > +1 (non-binding) > > On Sun, Jun 22, 2025 at 4:52 PM Hyukjin Kwon wrote: > > > +1 > > > > On Sun, 8 Jun 2025 at 02:12, Jules Damji wrote: > > > >> + 1 (non-binding) > >> — > >> Sent from my iPhone > >> Pardon the dumb thumb typos :) > >> > >> > On Jun

Re: [DISCUSS] Automation of RC email

2025-06-22 Thread Jules Damji
+ 1 (non-binding) Excuse the thumb typos On Fri, 20 Jun 2025 at 1:56 AM, Hyukjin Kwon wrote: > The email will be sent as normal just like the regular release vote emails > we have sent in the past. I just wanted to make sure if we're fine with > automatiging it. > > On Thu, Jun 19, 2025 at 7:3

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-22 Thread Hyukjin Kwon
+1 On Sun, 8 Jun 2025 at 02:12, Jules Damji wrote: > + 1 (non-binding) > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > > On Jun 4, 2025, at 8:10 AM, Dongjoon Hyun wrote: > > > > Thank you all. > > > > This vote passed. I'll conclude this. > > > > Dongjoon. > > > >> On 2025/06/03

Re: [DISCUSS] Automation of RC email

2025-06-20 Thread Hyukjin Kwon
The email will be sent as normal just like the regular release vote emails we have sent in the past. I just wanted to make sure if we're fine with automatiging it. On Thu, Jun 19, 2025 at 7:38 PM Steve Loughran wrote: > email private@spark and let whoever is releasing forward it? > > On Thu, 5 J

Re: [DISCUSS] Automation of RC email

2025-06-19 Thread Steve Loughran
email private@spark and let whoever is releasing forward it? On Thu, 5 Jun 2025 at 00:53, Hyukjin Kwon wrote: > Hi all, > > As some of you may know, I’ve been working on automating the Spark release > process (release.yml > ). The >

Re: [PR] feat: merge `spark-connect-rs` with apache project [spark-connect-rust]

2025-06-12 Thread via GitHub
xuanyuanking commented on PR #1: URL: https://github.com/apache/spark-connect-rust/pull/1#issuecomment-2965713362 Also cc’ing @andygrove for visibility and to provide guidance or suggestions related to compliance. Thank you, Andy! -- This is an automated message from the Apache Git Servic

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Wenchen Fan
+1 On Tue, Jun 10, 2025 at 11:40 AM Herman van Hovell wrote: > +1 > > On Tue, Jun 10, 2025 at 2:04 PM Rozov, Vlad > wrote: > >> +1 (non-binding) >> >> Thank you, >> >> Vlad >> >> On Jun 10, 2025, at 10:44 AM, Sakthi wrote: >> >> +1 (non-binding) >> >> On Mon, Jun 9, 2025 at 8:28 PM bo yang wr

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Herman van Hovell
+1 On Tue, Jun 10, 2025 at 2:04 PM Rozov, Vlad wrote: > +1 (non-binding) > > Thank you, > > Vlad > > On Jun 10, 2025, at 10:44 AM, Sakthi wrote: > > +1 (non-binding) > > On Mon, Jun 9, 2025 at 8:28 PM bo yang wrote: > >> +1 (non-binding), thanks Martin! >> >> On Mon, Jun 9, 2025 at 7:47 PM Che

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Rozov, Vlad
+1 (non-binding) Thank you, Vlad On Jun 10, 2025, at 10:44 AM, Sakthi wrote: +1 (non-binding) On Mon, Jun 9, 2025 at 8:28 PM bo yang mailto:bobyan...@gmail.com>> wrote: +1 (non-binding), thanks Martin! On Mon, Jun 9, 2025 at 7:47 PM Cheng Pan mailto:pan3...@gmail.com>> wrote: +1 (non-bindi

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-10 Thread Sakthi
+1 (non-binding) On Mon, Jun 9, 2025 at 8:28 PM bo yang wrote: > +1 (non-binding), thanks Martin! > > On Mon, Jun 9, 2025 at 7:47 PM Cheng Pan wrote: > >> +1 (non-binding) >> >> I verified: >> >> 1. LICENSE/NOTICE are present >> 2. Signatures is correct >> 3. Build source code and run UT (I hav

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread bo yang
+1 (non-binding), thanks Martin! On Mon, Jun 9, 2025 at 7:47 PM Cheng Pan wrote: > +1 (non-binding) > > I verified: > > 1. LICENSE/NOTICE are present > 2. Signatures is correct > 3. Build source code and run UT (I have to replace sparksrc folder with > the content of spark-4.0.0.tgz to make the

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Cheng Pan
+1 (non-binding) I verified: 1. LICENSE/NOTICE are present 2. Signatures is correct 3. Build source code and run UT (I have to replace sparksrc folder with the content of spark-4.0.0.tgz to make the source happen) Thanks, Cheng Pan > On Jun 10, 2025, at 00:59, Martin Grund wrote: > > Hi fo

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
Hi folks, Please vote on releasing the following candidate as Apache Spark Connect Go Client 0.1.0. The release candidate was tested and built against Spark 4.0.0. The repository contains a sample application for submitting jobs written in Go using a small JVM wrapper

Re: [DISCUSS] Automation of RC email

2025-06-09 Thread Hyukjin Kwon
The PR is ready for a look 👍 On Sun, 8 Jun 2025 at 17:41, Hyukjin Kwon wrote: > I am working on it at https://github.com/apache/spark/pull/51122. > Some emails might be sent for RC 3.5.7 for testing purposes. Please ignore > them :-). I will reply to individual email as well to avoid confusion.

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-06-09 Thread Hyukjin Kwon
These RC artifacts were dropped properly. On Mon, 9 Jun 2025 at 07:09, Hyukjin Kwon wrote: > This is an automated vote. Please ignore it. > > On Mon, Jun 9, 2025 at 6:46 AM wrote: > >> Please vote on releasing the following candidate as Apache Spark version >> 3.5.7. >> >> The vote is open unti

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
Thanks for the feedback, I'll address it shortly. On Mon, Jun 9, 2025 at 08:31 Cheng Pan wrote: > Hi Martin, > > Thanks for addressing it, a few questions/issues I found: > > 1. The "fun Version"[1] returns "3.5.x”, this does not look like a correct > version as you claim this release candidates

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Cheng Pan
Hi Martin, Thanks for addressing it, a few questions/issues I found: 1. The "fun Version"[1] returns "3.5.x”, this does not look like a correct version as you claim this release candidates was built and tested against Spark 4.0.0. 2. Seems your public key was not added to KEYS, so I can not ve

Re: [VOTE] Release Spark 3.5.7 (RC1)

2025-06-09 Thread Hyukjin Kwon
This is an automated vote. Please ignore it. On Mon, Jun 9, 2025 at 6:46 AM wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.5.7. > > The vote is open until Fri, 13 Jun 2025 06:32:20 PDT and passes if a > majority +1 PMC votes are cast, with > a minimum of 3

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-09 Thread Martin Grund
I updated the release based on the tag with the source releases and the proper signature. https://github.com/apache/spark-connect-go/releases/tag/v0.1.0-rc1 On Sun, Jun 8, 2025 at 10:44 PM Cheng Pan wrote: > The release artifacts don’t satisfy the ASF release policy[1]. > > > Projects MUST dire

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-09 Thread Jungtaek Lim
which means Spark will use RocksDB store for shuffle service. To > restore the behavior before Spark 4.0, you can set > `spark.shuffle.service.db.backend` to `LEVELDB`. > > So for users who hadn't explicitly configured the aforementioned options > to be `LEVELDB` before, the situa

Re: [DISCUSS] SPIP: Upgrade Apache Hive to 4.x

2025-06-09 Thread Mich Talebzadeh
Thanks Angel for offer of your help I added some comments to this thread https://github.com/apache/iceberg/issues/2387 The problem was observed with postgres DB issue. So I am not sure the cause is metastore on transactional DB or not. This error may not be relevant to other metastore for Hive b

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-08 Thread Yang Jie
data reconstruction or re-parsing have already existed. On 2025/06/09 01:08:05 Jungtaek Lim wrote: > Thanks for the valuable input. > > I think it's more about the case where upgrading would surprise the end > users. If we simply remove LevelDB from the next release, we will be > re

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Cheng Pan
The release artifacts don’t satisfy the ASF release policy[1]. > Projects MUST direct outsiders towards official releases rather than raw > source repositories, nightly builds, snapshots, release candidates, or any > other similar packages. > Every ASF release MUST contain one or more source pa

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Yuanjian Li
+1 On Sun, Jun 8, 2025 at 21:40 Jules Damji wrote: > +1 (non-binding? > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Jun 8, 2025, at 9:32 PM, Hyukjin Kwon wrote: > >  > > +1 > > On Sun, Jun 8, 2025 at 9:22 PM Martin Grund > wrote: > >> Please vote on releasing the following

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Jules Damji
+1 (non-binding? —Sent from my iPhonePardon the dumb thumb typos :)On Jun 8, 2025, at 9:32 PM, Hyukjin Kwon wrote:+1On Sun, Jun 8, 2025 at 9:22 PM Martin Grund wrote:Please vote on releasing the following candidate as Apache Spark Connect Go Client 0.1.0. The release candidate was tested and bui

Re: [VOTE] Release Apache Spark Connect Go Client 0.1.0

2025-06-08 Thread Hyukjin Kwon
+1 On Sun, Jun 8, 2025 at 9:22 PM Martin Grund wrote: > Please vote on releasing the following candidate as Apache Spark Connect > Go Client 0.1.0. > > The release candidate was tested and built against Spark 4.0.0. The > repository contains a sample application for submitting jobs written in Go

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-08 Thread Jungtaek Lim
;t use the hybrid store. 2025년 6월 6일 (금) 오후 5:08, Cheng Pan 님이 작성: > I think SHS only uses LevelDB/RocksDB to store intermediate data, > supporting re-parsing to rebuild the cache should be fine enough. > > Also share my experience about using LevelDB/RocksDB for SHS, it seems > Level

Re: [DISCUSS] Automation of RC email

2025-06-08 Thread Hyukjin Kwon
I am working on it at https://github.com/apache/spark/pull/51122. Some emails might be sent for RC 3.5.7 for testing purposes. Please ignore them :-). I will reply to individual email as well to avoid confusion. On Thu, 5 Jun 2025 at 20:07, Yang Jie wrote: > Option 1 +1, thank you, Hyukjin, for

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-07 Thread Jules Damji
+ 1 (non-binding) — Sent from my iPhone Pardon the dumb thumb typos :) > On Jun 4, 2025, at 8:10 AM, Dongjoon Hyun wrote: > > Thank you all. > > This vote passed. I'll conclude this. > > Dongjoon. > >> On 2025/06/03 00:45:20 Reynold Xin wrote: >> +1 >> >>> On Mon, Jun 2, 2025 at 5:45 PM Xin

Re: [DISCUSS] SPIP: Upgrade Apache Hive to 4.x

2025-06-07 Thread Ángel Álvarez Pascua
I'm also interested in this SPIP. There was someone else also working on this, if I remember correctly. @Mich Talebzadeh , if you need any help with that issue, let me know. El vie, 6 jun 2025, 1:07, Mich Talebzadeh escribió: > i started working on this by upgrading my hadoop to > > Hadoop 3.4

Re: Question Regarding Spark Dependencies in Scala

2025-06-06 Thread Ángel Álvarez Pascua
uot;6.4.0", > "org.apache.avro" % "avro" % apacheAvro, > "io.confluent" % "kafka-schema-registry-client" % "7.5.1", > "com.github.pureconfig" %% "pureconfig" % "0.17.5" > ) > > And not to a

Re: Question Regarding Spark Dependencies in Scala

2025-06-06 Thread Sem
s, I would > > > > > > > expect to set only: > > > > > > > > > > > > > > libraryDependencies ++= Seq( > > > > > > > > > > > > > > "io.delta" %% "delta-spark" % deltaVersion %

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-06 Thread Cheng Pan
I think SHS only uses LevelDB/RocksDB to store intermediate data, supporting re-parsing to rebuild the cache should be fine enough. Also share my experience about using LevelDB/RocksDB for SHS, it seems LevelDB has native memory leak issues, at least for the SHS use case, I need to reboot the

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-06 Thread Jungtaek Lim
gt; > > Thanks for initiating this. > > > > I wonder if we don't have any compatibility issue on every component - > SS area does not have an issue, but I don't quite remember if the history > server would be OK with this. What is the story of the migration if they &

Re: [DISCUSS] Automation of RC email

2025-06-05 Thread Jungtaek Lim
Thanks for the confirmation. That sounds great as long as the ASF account information is required per run and never be stored somewhere after the run. 2025년 6월 6일 (금) 오전 11:39, Hyukjin Kwon 님이 작성: > When you run the GitHub Actions to release, it requires you to specify an > ASF account and passwo

Re: [DISCUSS] Automation of RC email

2025-06-05 Thread Yang Jie
Option 1 +1, thank you, Hyukjin, for the efforts you've put into this. On 2025/06/06 02:59:32 Jungtaek Lim wrote: > Thanks for the confirmation. That sounds great as long as the ASF account > information is required per run and never be stored somewhere after the run. > > 2025년 6월 6일 (금) 오전 11:39

Re: [DISCUSS] Automation of RC email

2025-06-05 Thread Hyukjin Kwon
When you run the GitHub Actions to release, it requires you to specify an ASF account and password in GitHub Secrets. So I plan to use that to send an email. I will probably add a note that the email was auto generated .. On Fri, 6 Jun 2025 at 11:37, Jungtaek Lim wrote: > One question: is it pos

Re: [DISCUSS] Automation of RC email

2025-06-05 Thread Jungtaek Lim
One question: is it possible for the automation to send the mail on behalf of release manager? Or will we simply send the mail as specific mail account (mostly dedicated one for automated)? Maybe latter doesn’t even matter, but it might be less clear about who is driving the release, from automate

Re: [DISCUSS] Dropping LevelDB support in Spark

2025-06-05 Thread Jia Fan
they had been > using leveldb? I guess it could be probably re-parsed, but do we need to ask > users to perform some manual work to do that? > > On Wed, May 28, 2025 at 2:27 PM Yang Jie wrote: >> >> The project "org.fusesource.leveldbjni:leveldbjni" released its last

Re: [DISCUSS] SPIP: Upgrade Apache Hive to 4.x

2025-06-05 Thread Mich Talebzadeh
i started working on this by upgrading my hadoop to Hadoop 3.4.1 My Hive is Driver: Hive JDBC (version 4.0.1) Transaction isolation: TRANSACTION_REPEATABLE_READ Running init script /home/hduser/dba/bin/add_jars.hql 25/06/05 23:33:44 [main]: WARN util.NativeCodeLoader: Unable to load native-hadoo

Re: [DISCUSS] Automation of RC email

2025-06-05 Thread Wenchen Fan
+1 for email automation! On Thu, Jun 5, 2025 at 8:22 AM Yuanjian Li wrote: > +1 for option 1. > > Seems the only downside of option 1 is that some RC numbers may be > non-sequential. > > Dongjoon Hyun 于2025年6月5日周四 07:57写道: > >> +1 for the proposal, Hyukjin. Thank you for the whole and seamless

Re: [DISCUSS] Automation of RC email

2025-06-05 Thread Yuanjian Li
+1 for option 1. Seems the only downside of option 1 is that some RC numbers may be non-sequential. Dongjoon Hyun 于2025年6月5日周四 07:57写道: > +1 for the proposal, Hyukjin. Thank you for the whole and seamless > migration toward this direction. > > Please make it sure that we explicitly show the hum

Re: [DISCUSS] Automation of RC email

2025-06-05 Thread Dongjoon Hyun
+1 for the proposal, Hyukjin. Thank you for the whole and seamless migration toward this direction. Please make it sure that we explicitly show the human release manager name and email address (instead of bot sender) in the generated email. That's the only concern I have. Thanks, Dongjoon. On

Re: [DISCUSS] Automation of RC email

2025-06-04 Thread Mridul Muralidharan
We can always invalidate the vote with -1 in case it is found to be sent incorrectly ... As long as the automation does not end up generating a tonne of mails, that is, it should be fairly manageable :) I am in favor of automating it with option 1. Thanks for driving this Hyukjin ! Regards, Mri

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-04 Thread Boumalhab, Chris
Hi Lee, Thanks for the info. I'm familiar with theta and tuple sketches' implementations under the hood. Will let you know if I have any questions! Thank you for all the work your team does. Chris

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-04 Thread Lee Rhodes
Hi, This is Lee Rhodes (lee...@apache.org) from the Apache DataSketches team. I am pleased that there is interest in the Spark community for integrating our library more tightly into Spark! I would like to help if I can. Unfortunately, I am not Spark fluent so I'm not going to be very useful fo

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-04 Thread Mich Talebzadeh
And great effort by you Jerry to drive this proposal through. Let us see how it progresses.Will be interesting Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile On

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-04 Thread Jerry Peng
Thank you all! Glad to see this much interest and support for this initiative! On Wed, Jun 4, 2025 at 1:27 PM L. C. Hsieh wrote: > Hi all, > > Thanks all for participating and your support! The vote has been passed. > I'll send out the result in a separate thread. > > On Mon, Jun 2, 2025 at 7:5

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-04 Thread L. C. Hsieh
Hi all, Thanks all for participating and your support! The vote has been passed. I'll send out the result in a separate thread. On Mon, Jun 2, 2025 at 7:53 PM Wenchen Fan wrote: > > +1 > > On Tue, Jun 3, 2025 at 10:16 AM bo yang wrote: >> >> +1 (non-binding) >> >> On Mon, Jun 2, 2025 at 7:13 PM

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-04 Thread Dongjoon Hyun
Thank you all. This vote passed. I'll conclude this one. Dongjoon. On 2025/06/03 00:45:13 Xinrong Meng wrote: > +1 > > Thanks Dongjoon! > > On Mon, Jun 2, 2025 at 4:29 PM Jungtaek Lim > wrote: > > > +1 (non-binding) > > > > On Mon, Jun 2, 2025 at 11:18 PM Dongjoon Hyun wrote: > > > >> +1 >

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-04 Thread Dongjoon Hyun
Thank you all. This vote passed. I'll conclude this. Dongjoon. On 2025/06/03 00:45:20 Reynold Xin wrote: > +1 > > On Mon, Jun 2, 2025 at 5:45 PM Xinrong Meng wrote: > > > +1 > > > > Thank you Dongjoon! > > > > On Mon, Jun 2, 2025 at 4:29 PM Jungtaek Lim > > wrote: > > > >> +1 (non-binding) >

Re: Question Regarding Spark Dependencies in Scala

2025-06-04 Thread Nimrod Ofek
> libraryDependencies ++= Seq( >>>>>> >>>>>> "io.delta" %% "delta-spark" % deltaVersion % Provided, >>>>>> "org.apache.spark" %% "spark-avro" % sparkVersion, >>>>>> "org.apache.spark"

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
This looks good to me! I’m considering tuple too if we have theta. Theta can be priority, but given that tuple is just an extension, it doesn’t hurt to add down the line.

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
Hi Ryan, Thanks for the reply! Would you recommend I put in a JIRA ticket and consider developing this? I’m not familiar with the process. Chris From: Ryan Berti Date: Tuesday, June 3, 2025 at 6:13 PM To: "cboum...@amazon.com.invalid" Cc: "dev@spark.apache.org" Su

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
Following what Ryan did for HLL sketches, I would also add an aggregate expression for unions as the aggregate version of the binary union expression. The expressions that Ryan added are: hll_sketch_agg hll_union hll_union_agg hll_sketch_estimate Following the same naming convention I would prob

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
I think something like this could work: theta_sketch_agg(col) to build the sketch theta_sketch_union(sketch1, sketch2) to union the sketches theta_sketch_estimate(sketch) or theta_sketch_estimate_count(sketch) to estimate count … Something similar can be done for tuple support. Let me know what

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
m: Menelaos Karavelas <mailto:menelaos.karave...@gmail.com>> > Date: Tuesday, June 3, 2025 at 6:15 PM > To: "Boumalhab, Chris" <mailto:cboum...@amazon.com.INVALID>> > Cc: "dev@spark.apache.org <mailto:dev@spark.apache.org>" > mailto:dev@spark.apa

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
; % sparkVersion, >>>>> "org.apache.spark" %% "spark-streaming-kafka-0-10" % sparkVersion, >>>>> "za.co.absa" %% "abris" % "6.4.0", >>>>> "org.apache.avro" % "avro" % apacheA

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
, 2025 at 6:15 PM To: "Boumalhab, Chris" Cc: "dev@spark.apache.org" Subject: RE: [EXTERNAL] [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you ca

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
ot; % "7.5.1", >>>> "com.github.pureconfig" %% "pureconfig" % "0.17.5" >>>> ) >>>> >>>> And not to add also >>>> >>>> "org.apache.spark" %% "spark-sql" % sparkVersion % Pr

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
Hello Chris. HLL sketches from the same project (Apache DataSketches) have already been integrated in Spark. How does your proposal fit given what I just mentioned? - Menelaos > On Jun 3, 2025, at 2:52 PM, Boumalhab, Chris > wrote: > > Hi all, > > I’d like to start a discussion about addi

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Ryan Berti
Hi Chris, We integrated DataSketches into Spark when we introduced the hll_sketch_* UDFs - see the PR from 2023 for more info. I'm sure there'd be interest in exposing other types of sketches, and I bet there'd be some potential for code-reuse between t

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
They just need to configure spark-provided :) Thanks, Nimrod On Tue, Jun 3, 2025 at 8:57 PM Sean Owen wrote: > For sure, but, that is what Maven/SBT do. It resolves your project > dependencies, looking at all their transitive dependencies, according to > some rules. > You do not need to r

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
>>> >>> And not to add also >>> >>> "org.apache.spark" %% "spark-sql" % sparkVersion % Provided, >>> >>> >>> And to be honest - I don't think that the users really need to >>> understand the internal st

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
sion % Provided, >> >> >> And to be honest - I don't think that the users really need to understand >> the internal structure to know what jar they need to add to use each >> feature... >> I don't think they need to know what project they need to depend

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
For sure, but, that is what Maven/SBT do. It resolves your project dependencies, looking at all their transitive dependencies, according to some rules. You do not need to re-declare Spark's dependencies in your project, no. I'm not quite sure what you mean. On Tue, Jun 3, 2025 at 12:55

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
d on - as > long as it's already provided... They just need to configure spark-provided > :) > > Thanks, > Nimrod > > > On Tue, Jun 3, 2025 at 8:57 PM Sean Owen wrote: > >> For sure, but, that is what Maven/SBT do. It resolves your project >> dependen

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
Thanks Sean. There are other dependencies that you need to align with Spark if you need to use them as well - like Guava, Jackson etc. I find them more difficult to use - because you need to go to Spark repo to check the correct version used - and if there are upgrades between versions you need to

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Sean Owen
I think this is already how it works. Most apps would depend on just spark-sql (which depends on spark-core, IIRC). Maybe some optionally pull in streaming or mllib. I don't think it's intended that you pull in all submodules for any one app, although you could. I don't know if there's some common

Re: Question Regarding Spark Dependencies in Scala

2025-06-03 Thread Nimrod Ofek
Hi all, Sorry for bumping this again - just trying to understand if it's worth adding a small feature for this - I think it can help Spark users and Spark libraries upgrade and support Spark versions a lot easier :) If instead of adding many provided dependencies we'll have one that will include t

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Cheng Pan
+1 (non-binding) Thanks, Cheng Pan > On Jun 2, 2025, at 03:00, L. C. Hsieh wrote: > > Hi all, > > I would like to start a vote on the new real-time mode in Apache Spark > Structured Streaming. > > Discussion thread: > https://lists.apache.org/thread/ovmfbzfkc3t9odvv5gs75fhpvdckn90f > SPIP:

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Wenchen Fan
+1 On Tue, Jun 3, 2025 at 10:16 AM bo yang wrote: > +1 (non-binding) > > On Mon, Jun 2, 2025 at 7:13 PM Reynold Xin > wrote: > >> +1 >> >> On Mon, Jun 2, 2025 at 7:10 PM Kent Yao wrote: >> >>> +1 >>> >>> Sandy Ryza 于2025年6月2日周一 23:00写道: >>> +1 (non-binding) On Mon, Jun 2, 2025

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Reynold Xin
+1 On Mon, Jun 2, 2025 at 7:10 PM Kent Yao wrote: > +1 > > Sandy Ryza 于2025年6月2日周一 23:00写道: > >> +1 (non-binding) >> >> On Mon, Jun 2, 2025 at 7:34 AM Chao Sun wrote: >> >>> +1 >>> >>> On Mon, Jun 2, 2025 at 7:31 AM Jungtaek Lim < >>> kabhwan.opensou...@gmail.com> wrote: >>> +1 (non-bindi

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Kent Yao
+1 Sandy Ryza 于2025年6月2日周一 23:00写道: > +1 (non-binding) > > On Mon, Jun 2, 2025 at 7:34 AM Chao Sun wrote: > >> +1 >> >> On Mon, Jun 2, 2025 at 7:31 AM Jungtaek Lim >> wrote: >> >>> +1 (non-binding) >>> >>> On Mon, Jun 2, 2025 at 11:09 PM Wenchen Fan wrote: >>> +1 On Mon, Jun 2,

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread bo yang
+1 (non-binding) On Mon, Jun 2, 2025 at 7:13 PM Reynold Xin wrote: > +1 > > On Mon, Jun 2, 2025 at 7:10 PM Kent Yao wrote: > >> +1 >> >> Sandy Ryza 于2025年6月2日周一 23:00写道: >> >>> +1 (non-binding) >>> >>> On Mon, Jun 2, 2025 at 7:34 AM Chao Sun wrote: >>> +1 On Mon, Jun 2, 2025 at

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Reynold Xin
+1 On Mon, Jun 2, 2025 at 5:45 PM Xinrong Meng wrote: > +1 > > Thank you Dongjoon! > > On Mon, Jun 2, 2025 at 4:29 PM Jungtaek Lim > wrote: > >> +1 (non-binding) >> >> On Tue, Jun 3, 2025 at 12:18 AM Denny Lee wrote: >> >>> +1 (non-binding) >>> >>> >>> On Mon, Jun 2, 2025 at 07:44 Sandy Ryza

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-02 Thread Xinrong Meng
+1 Thanks Dongjoon! On Mon, Jun 2, 2025 at 4:29 PM Jungtaek Lim wrote: > +1 (non-binding) > > On Mon, Jun 2, 2025 at 11:18 PM Dongjoon Hyun wrote: > >> +1 >> >> Dongjoon >> >> On 2025/06/02 13:12:46 "Rozov, Vlad" wrote: >> > +1 (non-binding) >> > >> > Thank you, >> > >> > Vlad >> > >> > On Jun

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Xinrong Meng
+1 Thank you Dongjoon! On Mon, Jun 2, 2025 at 4:29 PM Jungtaek Lim wrote: > +1 (non-binding) > > On Tue, Jun 3, 2025 at 12:18 AM Denny Lee wrote: > >> +1 (non-binding) >> >> >> On Mon, Jun 2, 2025 at 07:44 Sandy Ryza >> wrote: >> >>> +1 (non-binding) >>> >>> On Mon, Jun 2, 2025 at 7:20 AM Don

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Jungtaek Lim
+1 (non-binding) On Tue, Jun 3, 2025 at 12:18 AM Denny Lee wrote: > +1 (non-binding) > > > On Mon, Jun 2, 2025 at 07:44 Sandy Ryza > wrote: > >> +1 (non-binding) >> >> On Mon, Jun 2, 2025 at 7:20 AM Dongjoon Hyun wrote: >> >>> +1 >>> >>> Dongjoon >>> >>> On 2025/06/02 13:13:45 "Rozov, Vlad" wr

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-02 Thread Jungtaek Lim
+1 (non-binding) On Mon, Jun 2, 2025 at 11:18 PM Dongjoon Hyun wrote: > +1 > > Dongjoon > > On 2025/06/02 13:12:46 "Rozov, Vlad" wrote: > > +1 (non-binding) > > > > Thank you, > > > > Vlad > > > > On Jun 1, 2025, at 7:44 PM, Liu Cao wrote: > > > > +1 (non-binding) > > > > On Sun, Jun 1, 2025 at

RE: Inquiry: Best Practices for Replacing Snappy with LZ4/LZF Compression Across Spark Codebase (including test cases)

2025-06-02 Thread Balaji Sudharsanam V
Missed to mention , we are exploring this in Spark 4.0. Be it a configuration change or explicit code changes, throughout. We are keen to accommodate the recommended and the future proof solution approach. Any guidance, insights, or pointers to relevant documentation, JIRAs, or previous discuss

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Denny Lee
+1 (non-binding) On Mon, Jun 2, 2025 at 07:44 Sandy Ryza wrote: > +1 (non-binding) > > On Mon, Jun 2, 2025 at 7:20 AM Dongjoon Hyun wrote: > >> +1 >> >> Dongjoon >> >> On 2025/06/02 13:13:45 "Rozov, Vlad" wrote: >> > +1 (non-binding) >> > >> > Thank you, >> > >> > Vlad >> > >> > On Jun 1, 2025

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Rozov, Vlad
+1 (non-binding) Thank you, Vlad On Jun 1, 2025, at 7:21 PM, Wenchen Fan wrote: +1 On Mon, Jun 2, 2025 at 9:55 AM Yuanjian Li mailto:xyliyuanj...@gmail.com>> wrote: +1 On Sun, Jun 1, 2025 at 18:30 DB Tsai mailto:dbt...@dbtsai.com>> wrote: +1 Sent from my iPhone > On Jun 1, 2025, at 2:32 

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Sandy Ryza
+1 (non-binding) On Mon, Jun 2, 2025 at 7:34 AM Chao Sun wrote: > +1 > > On Mon, Jun 2, 2025 at 7:31 AM Jungtaek Lim > wrote: > >> +1 (non-binding) >> >> On Mon, Jun 2, 2025 at 11:09 PM Wenchen Fan wrote: >> >>> +1 >>> >>> On Mon, Jun 2, 2025 at 8:55 PM Peter Toth wrote: >>> +1

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Sandy Ryza
+1 (non-binding) On Mon, Jun 2, 2025 at 7:20 AM Dongjoon Hyun wrote: > +1 > > Dongjoon > > On 2025/06/02 13:13:45 "Rozov, Vlad" wrote: > > +1 (non-binding) > > > > Thank you, > > > > Vlad > > > > On Jun 1, 2025, at 7:21 PM, Wenchen Fan wrote: > > > > +1 > > > > On Mon, Jun 2, 2025 at 9:55 AM Yu

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Chao Sun
+1 On Mon, Jun 2, 2025 at 7:31 AM Jungtaek Lim wrote: > +1 (non-binding) > > On Mon, Jun 2, 2025 at 11:09 PM Wenchen Fan wrote: > >> +1 >> >> On Mon, Jun 2, 2025 at 8:55 PM Peter Toth wrote: >> >>> +1 >>> >>> On Mon, Jun 2, 2025 at 2:33 PM xianjin wrote: >>> +1. Sent from my iPhone

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Rozov, Vlad
+1 (non-binding) Thank you, Vlad On Jun 2, 2025, at 7:08 AM, Wenchen Fan wrote: +1 On Mon, Jun 2, 2025 at 8:55 PM Peter Toth mailto:peter.t...@gmail.com>> wrote: +1 On Mon, Jun 2, 2025 at 2:33 PM xianjin mailto:xian...@apache.org>> wrote: +1. Sent from my iPhone On Jun 2, 2025, at 12:50 P

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Jungtaek Lim
+1 (non-binding) On Mon, Jun 2, 2025 at 11:09 PM Wenchen Fan wrote: > +1 > > On Mon, Jun 2, 2025 at 8:55 PM Peter Toth wrote: > >> +1 >> >> On Mon, Jun 2, 2025 at 2:33 PM xianjin wrote: >> >>> +1. >>> Sent from my iPhone >>> >>> On Jun 2, 2025, at 12:50 PM, DB Tsai wrote: >>> >>> +1 looking

Re: [VOTE] Release Apache Spark Connect Swift Client 0.3.0 (RC1)

2025-06-02 Thread Dongjoon Hyun
+1 Dongjoon On 2025/06/02 13:13:45 "Rozov, Vlad" wrote: > +1 (non-binding) > > Thank you, > > Vlad > > On Jun 1, 2025, at 7:21 PM, Wenchen Fan wrote: > > +1 > > On Mon, Jun 2, 2025 at 9:55 AM Yuanjian Li > mailto:xyliyuanj...@gmail.com>> wrote: > +1 > > On Sun, Jun 1, 2025 at 18:30 DB Tsa

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-02 Thread Dongjoon Hyun
+1 Dongjoon On 2025/06/02 13:12:46 "Rozov, Vlad" wrote: > +1 (non-binding) > > Thank you, > > Vlad > > On Jun 1, 2025, at 7:44 PM, Liu Cao wrote: > > +1 (non-binding) > > On Sun, Jun 1, 2025 at 7:22 PM Wenchen Fan > mailto:cloud0...@gmail.com>> wrote: > +1 > > On Mon, Jun 2, 2025 at 9:30 

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Wenchen Fan
+1 On Mon, Jun 2, 2025 at 8:55 PM Peter Toth wrote: > +1 > > On Mon, Jun 2, 2025 at 2:33 PM xianjin wrote: > >> +1. >> Sent from my iPhone >> >> On Jun 2, 2025, at 12:50 PM, DB Tsai wrote: >> >> +1 looking forward to seeing real-time mode. >> Sent from my iPhone >> >> On Jun 1, 2025, at 9:47 

Re: [VOTE] Release Apache Spark K8s Operator 0.3.0 (RC1)

2025-06-02 Thread Rozov, Vlad
+1 (non-binding) Thank you, Vlad On Jun 1, 2025, at 7:44 PM, Liu Cao wrote: +1 (non-binding) On Sun, Jun 1, 2025 at 7:22 PM Wenchen Fan mailto:cloud0...@gmail.com>> wrote: +1 On Mon, Jun 2, 2025 at 9:30 AM DB Tsai mailto:dbt...@dbtsai.com>> wrote: +1 Sent from my iPhone > On Jun 1, 2025,

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Peter Toth
+1 On Mon, Jun 2, 2025 at 2:33 PM xianjin wrote: > +1. > Sent from my iPhone > > On Jun 2, 2025, at 12:50 PM, DB Tsai wrote: > > +1 looking forward to seeing real-time mode. > Sent from my iPhone > > On Jun 1, 2025, at 9:47 PM, Xiao Li wrote: > >  > +1 > > huaxin gao 于2025年6月1日周日 20:00写道: >

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread xianjin
+1.Sent from my iPhoneOn Jun 2, 2025, at 12:50 PM, DB Tsai wrote:+1 looking forward to seeing real-time mode.Sent from my iPhoneOn Jun 1, 2025, at 9:47 PM, Xiao Li wrote:+1huaxin gao 于2025年6月1日周日 20:00写道:+1On Sun, Jun 1, 2025 at 7:50 PM Tathagata Das

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Mich Talebzadeh
I am Ok with +1. Having said that there is a merit IMO to add a matrix highlighting the differences between real time and Continuous Processing (Continuous Mode) to SPIP. Unless the assumption is that spark has abandoned the Continuous Mode) altogether *Feature Real-time Processing (via

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Mark Hamstra
+0 I'm never going to like calling it real-time when it is not, but that's not enough to vote against the SPIP. On Mon, Jun 2, 2025 at 12:57 AM Liu Cao wrote: > > +1 (non-binding) > > > On Sun, Jun 1, 2025 at 11:42 PM Anish Shrigondekar > wrote: >> >> +1 (non-binding) - this will be really use

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-02 Thread Liu Cao
+1 (non-binding) On Sun, Jun 1, 2025 at 11:42 PM Anish Shrigondekar wrote: > +1 (non-binding) - this will be really useful for latency > sensitive workloads > > Thanks, > Anish > > On Sun, Jun 1, 2025 at 11:02 PM Sakthi wrote: > >> +1 (non-binding) >> >> On Sun, Jun 1, 2025 at 10:51 PM Genglia

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-01 Thread Anish Shrigondekar
+1 (non-binding) - this will be really useful for latency sensitive workloads Thanks, Anish On Sun, Jun 1, 2025 at 11:02 PM Sakthi wrote: > +1 (non-binding) > > On Sun, Jun 1, 2025 at 10:51 PM Gengliang Wang wrote: > >> +1 >> >> On Sun, Jun 1, 2025 at 10:20 PM Denny Lee wrote: >> >>> +1 (non

Re: [VOTE] SPIP: Real-Time Mode in Apache Spark Structured Streaming

2025-06-01 Thread Sakthi
+1 (non-binding) On Sun, Jun 1, 2025 at 10:51 PM Gengliang Wang wrote: > +1 > > On Sun, Jun 1, 2025 at 10:20 PM Denny Lee wrote: > >> +1 (non-binding) >> >> On Sun, Jun 1, 2025 at 22:18 L. C. Hsieh wrote: >> >>> +1 >>> >>> On Sun, Jun 1, 2025 at 9:48 PM DB Tsai wrote: >>> > >>> > +1 looking

  1   2   3   4   5   6   7   8   9   10   >