Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-04 Thread Boumalhab, Chris
Hi Lee, Thanks for the info. I'm familiar with theta and tuple sketches' implementations under the hood. Will let you know if I have any questions! Thank you for all the work your team does. Chris

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-04 Thread Lee Rhodes
Hi, This is Lee Rhodes (lee...@apache.org) from the Apache DataSketches team. I am pleased that there is interest in the Spark community for integrating our library more tightly into Spark! I would like to help if I can. Unfortunately, I am not Spark fluent so I'm not going to be very useful fo

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
This looks good to me! I’m considering tuple too if we have theta. Theta can be priority, but given that tuple is just an extension, it doesn’t hurt to add down the line.

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
bject: RE: [EXTERNAL] [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Chris, We integrated DataSketches into Spar

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
Following what Ryan did for HLL sketches, I would also add an aggregate expression for unions as the aggregate version of the binary union expression. The expressions that Ryan added are: hll_sketch_agg hll_union hll_union_agg hll_sketch_estimate Following the same naming convention I would prob

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
I think something like this could work: theta_sketch_agg(col) to build the sketch theta_sketch_union(sketch1, sketch2) to union the sketches theta_sketch_estimate(sketch) or theta_sketch_estimate_count(sketch) to estimate count … Something similar can be done for tuple support. Let me know what

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
m: Menelaos Karavelas <mailto:menelaos.karave...@gmail.com>> > Date: Tuesday, June 3, 2025 at 6:15 PM > To: "Boumalhab, Chris" <mailto:cboum...@amazon.com.INVALID>> > Cc: "dev@spark.apache.org <mailto:dev@spark.apache.org>" > mailto:dev@spark.apa

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
, 2025 at 6:15 PM To: "Boumalhab, Chris" Cc: "dev@spark.apache.org" Subject: RE: [EXTERNAL] [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you ca

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Menelaos Karavelas
Hello Chris. HLL sketches from the same project (Apache DataSketches) have already been integrated in Spark. How does your proposal fit given what I just mentioned? - Menelaos > On Jun 3, 2025, at 2:52 PM, Boumalhab, Chris > wrote: > > Hi all, > > I’d like to start

Re: [DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Ryan Berti
Hi Chris, We integrated DataSketches into Spark when we introduced the hll_sketch_* UDFs - see the PR from 2023 for more info. I'm sure there'd be interest in exposing other types of sketches, and I bet there'd be some potential for code-reuse between t

[DISCUSS] Proposal to Add Theta and Tuple Sketches to Spark SQL

2025-06-03 Thread Boumalhab, Chris
Hi all, I’d like to start a discussion about adding support for [Apache DataSketches](https://datasketches.apache.org/) — specifically, Theta and Tuple Sketches — to Spark SQL and DataFrame APIs. ## Motivation These sketches allow scalable approximate set operations (like distinct count, union

Re: [RESULT][VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-04-04 Thread Jungtaek Lim
Maybe I will just update the VOTE result, since the rationale of this VOTE, and the VOTE result is public. On Tue, Mar 18, 2025 at 10:00 PM Jungtaek Lim wrote: > I'm definitely OK with modifying migration logic to exclude "databricks" > if people think it is better. I'm even having a code change

[RESULT][VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-18 Thread Hyukjin Kwon
The vote passes with 5 +1s (4 binding +1s) and 3 -1s (3 binding -1s). (* = binding) +1: - Mark Hamstra * - Jungtaek Lim - Wenchen Fan * - Reynold Xin * - Yuanjian Li * -1: - Holden Karau * - Hyukjin Kwon * - Dongjoon Hyun * Thanks.

Re: [RESULT][VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-18 Thread Jungtaek Lim
I'm definitely OK with modifying migration logic to exclude "databricks" if people think it is better. I'm even having a code change locally. The reason I didn't ask killing the VOTE despite I have the other way around is, I think we made a huge mistake/fault w.r.t. this event, and I don't want my

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-18 Thread Jungtaek Lim
reement to take it out, then >>> arguably the migration logic is left as it is. >>> >>> This way I never needed to drive such a long and sensitive DISCUSSION >>> and VOTE. But the PR for master and 4.0 weren't merged because of the >>> indivi

Re: [RESULT][VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-18 Thread Hyukjin Kwon
>From my understanding, yes. On Tue, Mar 18, 2025 at 4:03 PM Wenchen Fan wrote: > Do I understand correctly that now we can merge > https://github.com/apache/spark/pull/49984 and unblock Spark 4.0? > > To make sure that everyone is on the same page as not all the people have > read all the discu

Re: [RESULT][VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-18 Thread Wenchen Fan
Do I understand correctly that now we can merge https://github.com/apache/spark/pull/49984 and unblock Spark 4.0? To make sure that everyone is on the same page as not all the people have read all the discussion threads: 1. This PR *DOES* *NOT* add back the misnamed configuration. People cannot se

Re: [RESULT][VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-17 Thread Mark Hamstra
As you noted previously, this does allow the original vote to proceed without a valid veto, but I will also note that this does not preclude modifying the migration logic later to avoid explicitly including “databricks” in the code if people think that is important and an agreeable, technically sou

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-17 Thread Hyukjin Kwon
a huge mistake before >> moving on? We should have merged the same content in 3.5 to master/4.0 as >> well, and then have a PR to remove the config. This is totally swapped >> which does not make sense to me. >> >>

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-17 Thread Yuanjian Li
lly swapped > which does not make sense to me. > > On Mon, Mar 17, 2025 at 1:57 PM Dongjoon Hyun wrote: > >> I reviewed Wenchen's reverting PR. Although it's a proposal for >> discussion, it is another breaking change against Apache Spark 3.5.5, isn'

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-17 Thread Wenchen Fan
> ASF Voting Process page: > > https://www.apache.org/foundation/voting.html#Veto > > A -1 vote by a qualified voter stops a code-modification proposal in its >> tracks. This constitutes a veto, and it cannot be overruled nor overridden >> by anyone. Vetoes stand until and unless the

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
Mon, Mar 17, 2025 at 1:57 PM Dongjoon Hyun wrote: > I reviewed Wenchen's reverting PR. Although it's a proposal for > discussion, it is another breaking change against Apache Spark 3.5.5, isn't > it? If we consider Apache Spark 3.5.4

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Dongjoon Hyun
I reviewed Wenchen's reverting PR. Although it's a proposal for discussion, it is another breaking change against Apache Spark 3.5.5, isn't it? If we consider Apache Spark 3.5.4 users, I believe we need to consider Apache Spark 3.5.5 users

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
If we are really wanting to make a "correct" discussion going forward, I believe the revert PR has to be merged. After that, either my proposal gets not accepted, or he starts to DISCUSS and eventually reaches the VOTE pass, or we just leave the config to be kept deprecated instead of re

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
h7wj>), > which is definitely less better than this proposal so I still want to see > this VOTE to go forward, but it's somewhat better in this situation that we > no longer talk about vendor name, hence no need to debate for more minor > versions. > > Though I still want to he

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
Holden, I think I have some workaround like I posted in dev@ (link <https://lists.apache.org/thread/xsq58800smtc5xo15kfzyj5kfw5yh7wj>), which is definitely less better than this proposal so I still want to see this VOTE to go forward, but it's somewhat better in this situation that w

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Reynold Xin
lready, so I think > that an immediate vote on the validity of Dongjoon's technical > justification for his veto of the "Retain migration logic ... in Spark > 4.0.x" proposal is in order. That technical justification has been > called into question, and the guidance at >

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
we now have a clear foundation for discussing solutions. As it >> stands, the misnamed configuration will be released in 4.0.0. I like >> Jungtaek’s proposal to deprecate it, but the decision is up to the >> community. >> >> On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim < &

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
ot; we were never decided about the direction of Spark 4.0.0 behavior. (link <https://github.com/apache/spark/pull/49983#issuecomment-2676531485>) 2. Dongjoon "agreed" my proposal is technically correct. (link <https://github.com/apache/spark/pull/49983#issuecomment-2676531485&g

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
tion for discussing solutions. As it > stands, the misnamed configuration will be released in 4.0.0. I like > Jungtaek’s proposal to deprecate it, but the decision is up to the > community. > > On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wr

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
e misnamed configuration will be released in 4.0.0. I like Jungtaek’s proposal to deprecate it, but the decision is up to the community. On Mon, Mar 17, 2025 at 10:19 AM Jungtaek Lim wrote: > OK, let's be super honest. > > Again, I think you agree that *"both" proposals are &q

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
7;s very easy for me to VETO to his proposal (although I don't have a binding vote, I think I have people who agree with me) if we think we want to definitely expand the interpretation of VETO criteria in the Apache Voting Process. You said it is up to the PMC member exercising the veto to use thei

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
ser/holdenkarau >>> Pronouns: she/her >>> >>> >>> On Sat, Mar 15, 2025 at 4:44 PM Mark Hamstra >>> wrote: >>> >>>> Quick administrative note: I don't see any reason why this vote should >>>> take a long time, so I

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
reason why this vote should >>> take a long time, so I expect to close the process and tally the votes >>> in not much more than 48 hours. >>> >>> On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra >>> wrote: >>> > >>> > There has

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Wenchen Fan
nded >> everyone about "what" we are going to VOTE. >> >> Dongjoon casted a VETO against code change VOTE. That VETO is described >> in ASF Voting Process page: >> >> https://www.apache.org/foundation/voting.html#Veto >> >> A -1 vote by a qualifi

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
qualified voter stops a code-modification proposal in its > tracks. This constitutes a veto, and it cannot be overruled nor overridden > by anyone. Vetoes stand until and unless the individual withdraws their > veto. > > To prevent vetoes from being used capriciously, the voter must

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Jungtaek Lim
the conversation >> here, there are some facts: >> >> 1. Dongjoon "knew" we were never decided about the direction of Spark >> 4.0.0 behavior. (link >> <https://github.com/apache/spark/pull/49983#issuecomment-2676531485>) >> 2. Dongjoon "agreed

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
thub.com/apache/spark/pull/49983 > > I might be missing another timeline, but, if you follow the conversation > here, there are some facts: > > 1. Dongjoon "knew" we were never decided about the direction of Spark > 4.0.0 behavior. (link > <https://github.co

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Mich Talebzadeh
>>> in not much more than 48 hours. >>> >>> On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra >>> wrote: >>> > >>> > There has been enough discussion on this topic already, so I think >>> > that an immediate vote on the validity of Dongj

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-16 Thread Holden Karau
t;> > There has been enough discussion on this topic already, so I think >> > that an immediate vote on the validity of Dongjoon's technical >> > justification for his veto of the "Retain migration logic ... in Spark >> > 4.0.x" proposal is in order. T

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Jungtaek Lim
Just one hope, I believe I have said I will hear about support of migration logic for next release. The scope of the VOTE is nothing beyond 4.0.x. Please do not interpret this in your own way. I continuously see that I am attacked by what I am not saying and I have to prove that I haven't said. Ple

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Jungtaek Lim
Dongjoon, I'm now OK with whatever you think, but I argue your vote is technically moot since it's about your vote justification, and I have no binding vote to counter you. Let's be fair. On Sun, Mar 16, 2025 at 3:07 PM Dongjoon Hyun wrote: > Thank you for focusing on this, Mark. > > I also agr

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Dongjoon Hyun
Thank you for focusing on this, Mark. I also agree with you that this should be decided by the Apache Spark PMC and appreciate the effort to help us move forward in the Apache way. As you mentioned, there is no ASF policy. That's true. > I am not aware of any ASF policy that strictly forbids th

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Mich Talebzadeh
t. > > I’m open to summary my justification again, but as a tl;dr, I have a > strong evidence that he knew we never had a consensus about 4.0 which > destroys his claim for “we agreed to release Spark 4.0.0 as it is”, and > also he said my proposal is technically correct, so he is

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Jungtaek Lim
claim for “we agreed to release Spark 4.0.0 as it is”, and > also he said my proposal is technically correct, so he is objecting himself > if he is really casting “veto”. > > Worth noting that his last post is all about technical justification of > “his own proposal”, not about techn

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Jungtaek Lim
about 4.0 which destroys his claim for “we agreed to release Spark 4.0.0 as it is”, and also he said my proposal is technically correct, so he is objecting himself if he is really casting “veto”. Worth noting that his last post is all about technical justification of “his own proposal”, not about

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Holden Karau
;s technical > > justification for his veto of the "Retain migration logic ... in Spark > > 4.0.x" proposal is in order. That technical justification has been > > called into question, and the guidance at > > https://www.apache.org/foundation/glossary.html#Veto leave

Re: [VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Mark Hamstra
so I think > that an immediate vote on the validity of Dongjoon's technical > justification for his veto of the "Retain migration logic ... in Spark > 4.0.x" proposal is in order. That technical justification has been > called into question, and the guidance

[VOTE] Technical Justification for the veto of the "Retain migration logic..." code change proposal is not valid

2025-03-15 Thread Mark Hamstra
There has been enough discussion on this topic already, so I think that an immediate vote on the validity of Dongjoon's technical justification for his veto of the "Retain migration logic ... in Spark 4.0.x" proposal is in order. That technical justification has been called into qu

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-13 Thread Khalid Mammadov
teams could choose the most > > efficient execution engine based on their needs: > > o *Data Engineers* would continue leveraging Spark's > > distributed processing for large-scale data transformations. > > o *ML Teams* could r

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-12 Thread Sem
s, and optimize performance across both batch and real-time workflows. *The Proposal:* *Introduce an API that allows PySpark syntax while processing DataFrame using either Spark or Pandas depending on the session context.* * * *Simple, but intuitive example:* import pyspark.sql.f

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread Sean Owen
ld continue leveraging Spark's distributed > processing for large-scale data transformations. > - *ML Teams* could run the same PySpark transformations using > *Pandas* as the processing engine for faster, on-the-fly feature > generation in mo

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread Mich Talebzadeh
are missing the point, > the proposal is not to break Spark ways of processing, but to use spark as > a wrapper to process pandas, same as `pandas_api()`, but the inverse. > > Most of the cases to serve ML models require low latency (ms) and ideal is > to re-generate the features

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread Tornike Gurgenidze
Wenchen Fan wrote: > >> Interesting, so this is PySpark on pandas which is the reverse of Koalas. >> >> If performance is the only problem, maybe we can improve local-mode Spark >> performance to be on par with these single-node engines. + @Hyukjin Kwon >> >> >&

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread José Müller
single-node engines. + @Hyukjin Kwon > > > On Mon, Feb 10, 2025 at 8:40 PM José Müller > wrote: > >> Hi Mitch, >> >> All you said is well understood, but I believe you are missing the point, >> the proposal is not to break Spark ways of processing, but to use s

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread Wenchen Fan
> All you said is well understood, but I believe you are missing the point, > the proposal is not to break Spark ways of processing, but to use spark as > a wrapper to process pandas, same as `pandas_api()`, but the inverse. > > Most of the cases to serve ML models require low late

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread José Müller
Hi Mitch, All you said is well understood, but I believe you are missing the point, the proposal is not to break Spark ways of processing, but to use spark as a wrapper to process pandas, same as `pandas_api()`, but the inverse. Most of the cases to serve ML models require low latency (ms) and

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread Mich Talebzadeh
tions. > - *ML Teams* could run the same PySpark transformations using > *Pandas* as the processing engine for faster, on-the-fly feature > generation in model training and API serving. > > This unified approach would eliminate redundant codebases, ensure

[PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread José Müller
g and API serving. This unified approach would eliminate redundant codebases, ensure consistent feature definitions, and optimize performance across both batch and real-time workflows. *The Proposal:* *Introduce an API that allows PySpark syntax while processing DataFrame using either Spark or

Re: Proposal to improve data skew debugging

2025-01-29 Thread Mich Talebzadeh
Hi Rob, As a matter of interest, have you got an indication of a ballpark figure for percentage of queries that end up with skewed distribution? Thanks Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile

Re: Proposal to improve data skew debugging

2025-01-27 Thread Rob Reeves
The counting does use count-min sketch and publishes the top K keys above a skew threshold to an accumulator. The core implementation in my prototype is in InlineApproxCountExec

Re: Proposal to improve data skew debugging

2025-01-24 Thread Mich Talebzadeh
Ok so the catalyst optimizer will use this method of inline key counting to provide spark optimizer with prior notification, so it identifies the hot keys? What is this inline key counting based? Likely Count-Min Sketch algorithm! HTH Mich Talebzadeh, Architect | Data Science | Financial Crime |

Proposal to improve data skew debugging

2025-01-24 Thread Rob Reeves
Hi Spark devs, I recently worked on a prototype to make it easier to identify the root cause of data skew in Spark. I wanted to see if the community was interested in it before working on contributing the changes (SPIP and PRs). *Problem* When a query has data skew today, you see outlier tasks ta

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-17 Thread Allison Wang
I'm a big +1 on this proposal. We should be able to continue improving the programming guides to enhance their quality and make this process easier. > Move the programming guide to the spark-website repo, to allow faster iterations and releases This is a great idea. It should work for st

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread serge rielau . com
for a version, it is called out in the guides. I agree with Wenchen's 3 points. I don't think we need to say that they have to go to the old page, but that if they want to, they can. Neil On Wed, Jun 5, 2024 at 12:04 PM Wenchen Fan mailto:cloud0...@gmail.com>> wrote: I agree wi

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread Wenchen Fan
ogramming guides are >>>>>> not consistent: >>>>>> >>>>>> * The Structured Streaming programming guide >>>>>> <https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html> >>>>>> is one

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread Neil Ramaswamy
gt;>>>> navigate to the other programming guides anymore >>>>> >>>>> I am looking forward to collaborating with the community and improving >>>>> the docs to 1. delight existing users and 2. attract new users. Docs are >>>

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread Wenchen Fan
Wed, Jun 5, 2024 at 3:22 PM Neil Ramaswamy >>>> wrote: >>>> >>>>> Thanks all for the responses. Let me try to address everything. >>>>> >>>>> > the programming guides are also different between versions since >>>>>

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-10 Thread Nimrod Ofek
rgue that mentioning >>>> an improvement that a version brings motivates users to upgrade more than >>>> keeping docs improvement to "new releases to keep the community updating". >>>> Users should upgrade to get a better Spark, not better Spark document

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-10 Thread Nicholas Chammas
gt;>> >>>> > having a programming guide that refers to features or API methods that >>>> > does not exist in that version is confusing and detrimental >>>> >>>> I don't think that we'd do this. Again, programming guides shou

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-10 Thread Mridul Muralidharan
hat version is confusing and detrimental >>> >>> I don't think that we'd do this. Again, programming guides should teach >>> fundamentals that do not change version-to-version. TypeScript >>> <https://www.typescriptlang.org/docs/handbook/typescript-from-scra

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-10 Thread Hyukjin Kwon
version-to-version. TypeScript >> <https://www.typescriptlang.org/docs/handbook/typescript-from-scratch.html> >> (which >> has one of the best DX's and docs) does this exceptionally well. Their >> guides are refined, versionless pages, new features ar

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Matthew Powers
elaborated upon in > release notes (analogous to our version-specific docs), and for the > occasional caveat for a version, it is called out in the guides. > > I agree with Wenchen's 3 points. I don't think we need to say that they > *have* to go to the old page, but that if

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Neil Ramaswamy
e with the idea of a versionless programming guide. But one thing we > need to make sure of is we give clear messages for things that are only > available in a new version. My proposal is: > >1. keep the old versions' programming guide unchanged. For example, >people

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Wenchen Fan
I agree with the idea of a versionless programming guide. But one thing we need to make sure of is we give clear messages for things that are only available in a new version. My proposal is: 1. keep the old versions' programming guide unchanged. For example, people can still access

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Martin Andersson
must be a better way to allow updating documentation more often? Best Regards, Martin From: Nimrod Ofek Sent: Wednesday, June 5, 2024 08:26 To: Neil Ramaswamy Cc: Praveen Gattu ; dev Subject: Re: [DISCUSS] Versionless Spark Programming Guide Proposal EXTERNAL

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Nimrod Ofek
n have only 6 months support cycle until eol for documentation- there are no major security concerns for documentation... Nimrod בתאריך יום ד׳, 5 ביוני 2024, 08:28, מאת Neil Ramaswamy ‏: > Hi Nimrod, > > Quick clarification—my proposal will not touch API-specific documentation &

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Neil Ramaswamy
Hi Nimrod, Quick clarification—my proposal will not touch API-specific documentation for the specific reasons you mentioned (signatures, behavior, etc.). It just aims to make the *programming guides *versionless. Programming guides should teach fundamentals of Spark, and the fundamentals of Spark

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Nimrod Ofek
mproving docs. However, we might > still need a way to provide version specific information isn't it, i.e. > what features are available in which version etc. > > On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy wrote: > >> Hi all, >> >> I've written up a p

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-04 Thread Praveen Gattu
+1. This helps for greater velocity in improving docs. However, we might still need a way to provide version specific information isn't it, i.e. what features are available in which version etc. On Mon, Jun 3, 2024 at 3:08 PM Neil Ramaswamy wrote: > Hi all, > > I've writt

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-23 Thread Jay Han
;>> unlock a wealth of collective wisdom to enhance your experience and >>> drive success." >>> >>> I don't know the logistics of setting it up.but I am sure that should >>> not be that difficult. If anyone is supportive of this proposal, le

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
to enhance your experience and >> drive success." >> >> I don't know the logistics of setting it up.but I am sure that should >> not be that difficult. If anyone is supportive of this proposal, let >> the usual +1, 0, -1 decide >> >> HTH >> >&

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Steve Loughran
bers like YOU can exchange > knowledge, tips, and best practices. Join the conversation today and > unlock a wealth of collective wisdom to enhance your experience and > drive success." > > I don't know the logistics of setting it up.but I am sure that should > not be that d

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
t; Good idea. Will be useful >>> >>> >>> >>> +1 >>> >>> >>> >>> >>> >>> >>> >>> *From: *ashok34...@yahoo.com.INVALID >>> *Date: *Monday, March 18, 2024 at 6:36 AM >>> *To: *user

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Varun Shah
don't know the logistics of setting it up.but I am sure that should > not be that difficult. If anyone is supportive of this proposal, let > the usual +1, 0, -1 decide > > HTH > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect | Engineer > London

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Deepak Sharma
>> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >> *Date: *Monday, March 18, 2024 at 6:36 AM >> *To: *user @spark , Spark dev list < >> dev@spark.apache.org>, Mich Talebzadeh >> *Cc: *Matei Zaharia >> *Subject: *R

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Hyukjin Kwon
org/wiki/Wernher_von_Braun>)". > > > On Mon, 18 Mar 2024 at 16:23, Parsian, Mahmoud > wrote: > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
gt;> >>> >>> man. 18. mars 2024 kl. 17:26 skrev Parsian, Mahmoud < >>> mpars...@illumina.com.invalid>: >>> >>>> Good idea. Will be useful >>>> >>>> >>>> >>>> +1 >>>>

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Reynold Xin
;> >>> >>> >>> +1 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *From:* ashok34668@ yahoo. com. INVAL

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
ars 2024 kl. 17:26 skrev Parsian, Mahmoud > : > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >> *Date: *Monday, March 18, 2024 at 6:36 AM >> *To: *

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Bjørn Jørgensen
y, March 18, 2024 at 6:36 AM > *To: *user @spark , Spark dev list < > dev@spark.apache.org>, Mich Talebzadeh > *Cc: *Matei Zaharia > *Subject: *Re: A proposal for creating a Knowledge Sharing Hub for Apache > Spark Community > > External message, be mindful when clicking l

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
> dev@spark.apache.org>, Mich Talebzadeh > *Cc: *Matei Zaharia > *Subject: *Re: A proposal for creating a Knowledge Sharing Hub for Apache > Spark Community > > External message, be mindful when clicking links or attachments > > > > Good idea. Will be useful >

A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
uld not be that difficult. If anyone is supportive of this proposal, let the usual +1, 0, -1 decide HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The informat

Proposal about moving on from the Shepherd terminology in SPIPs

2024-02-23 Thread Mich Talebzadeh
engths and contributions. It also avoids any potentially offensive or hierarchical connotations. Great if you share your thoughts and participate in discussion to consider this proposal and discuss any potential challenges or solutions during the transition period.in SPIP (assuming we accept thi

[VOTE][RESULT] SPIP: Support Customized Kubernetes Schedulers Proposal

2022-01-20 Thread Yikun Jiang
Hi all, The vote passed with the following 14 +1 votes and no -1 or +0 votes: Bowen Li Weiwei Yang Chenya Zhang Chaoran Yu William Wang Holden Karau * bo yang Mich Talebzadeh John Zhuge Thomas Graves * Kent Yao Mridul Muralidharan * Ryan Blue Yikun Jiang * = binding Thank you guys all for your

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-20 Thread Yikun Jiang
haven't had time to look at the implementation >>>> details is please make sure resource aware scheduling and the stage >>>> level scheduling still work or any caveats are documented. Feel free >>>> to ping me if questions in these areas. >>>> &g

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-12 Thread Ryan Blue
e documented. Feel free >>> to ping me if questions in these areas. >>> >>> Tom >>> >>> On Wed, Jan 5, 2022 at 7:07 PM Yikun Jiang wrote: >>> > >>> > Hi all, >>> > >>> > I’d like to start a vote

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-12 Thread Mridul Muralidharan
rote: >> > >> > Hi all, >> > >> > I’d like to start a vote for SPIP: "Support Customized Kubernetes >> Schedulers Proposal" >> > >> > The SPIP is to support customized Kubernetes schedulers in Spark on >> Kubernetes. &

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-12 Thread Kent Yao
Feel free > to ping me if questions in these areas. > > Tom > > On Wed, Jan 5, 2022 at 7:07 PM Yikun Jiang wrote: > > > > Hi all, > > > > I’d like to start a vote for SPIP: "Support Customized Kubernetes > Schedulers Proposal" > > > &

Re: [VOTE][SPIP] Support Customized Kubernetes Schedulers Proposal

2022-01-11 Thread Thomas Graves
07 PM Yikun Jiang wrote: > > Hi all, > > I’d like to start a vote for SPIP: "Support Customized Kubernetes Schedulers > Proposal" > > The SPIP is to support customized Kubernetes schedulers in Spark on > Kubernetes. > > Please also refer to: > > - P

  1   2   3   4   5   >