Ok guys great discussion all around.

it would be* interesting* to observe sentiments by participants in this
thread. The source is from spark dev archive for this topic as attached.
The code uses Python with the relevant libraries see the code and imbedded
explanation

*Approach for Sentiment Analysis*

*Data Extraction using Python*
Parsed the text file to extract messages per user from the attached file
from spark dev archive
Processed each message individually.
*Sentiment Scoring*
Used NLTK's VADER (Valence Aware Dictionary and sEntiment Reasoner)
<https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p023/artificial-intelligence/sentiment_analysis#:~:text=Sentiment%20analysis%20is%20a%20technique%20in%20Natural%20Language,the%20overall%20mood%20is%20positive%2C%20negative%2C%20or%20neutral.>
to analyze the sentiment with help from AI
Assigned each message a score based on positive, neutral, and negative
sentiment expressed
*Aggregation:*
Computed an average sentiment score per user by taking the mean sentiment
across all their messages.

The final sentiment score for each user was calculated as:

Average Sentiment Score = (Sum of Compound Sentiment Scores) / (Total
Messages Sent)

[image: sentiment_score.png]
Dongjoon sentiment seems to be pretty neutral and the rest mildly positive

HTH



Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Sun, 16 Mar 2025 at 02:22, Jungtaek Lim <kabhwan.opensou...@gmail.com>
wrote:

> For sure, I’m +1 (non-binding).
>
> I believe I don’t need to explain more and I spent whole weekend and we
> have a history about my justification in the history of mailing list.
>
> I’m open to summary my justification again, but as a tl;dr, I have a
> strong evidence that he knew we never had a consensus about 4.0 which
> destroys his claim for “we agreed to release Spark 4.0.0 as it is”, and
> also he said my proposal is technically correct, so he is objecting himself
> if he is really casting “veto”.
>
> Worth noting that his last post is all about technical justification of
> “his own proposal”, not about technical objection of my proposal. I’m even
> unsure he really intended to cast a vote. I feel like we are overreacting
> but I’m happy to make progress of this at least.
>
> 2025년 3월 16일 (일) 오전 8:44, Mark Hamstra <markhams...@gmail.com>님이 작성:
>
>> Quick administrative note: I don't see any reason why this vote should
>> take a long time, so I expect to close the process and tally the votes
>> in not much more than 48 hours.
>>
>> On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra <markhams...@gmail.com>
>> wrote:
>> >
>> > There has been enough discussion on this topic already, so I think
>> > that an immediate vote on the validity of Dongjoon's technical
>> > justification for his veto of the "Retain migration logic ... in Spark
>> > 4.0.x" proposal is in order. That technical justification has been
>> > called into question, and the guidance at
>> > https://www.apache.org/foundation/glossary.html#Veto leaves it to the
>> > PMC to determine whether the technical justification is  valid: "In
>> > case of doubt, deciding whether a technical justification is valid is
>> > up to the PMC." As such, only PMC votes will decide the outcome of
>> > this vote. This is neither a vote on a code change itself not a vote
>> > on whether a package is ready for release, so it a procedural vote on
>> > whether the technical justification is valid. As such, the vote will
>> > be decided by a simple majority where +1 votes hold that the technical
>> > justification is not valid and -1 votes hold that the technical
>> > justification is valid.
>> >
>> > I would request that at least PMC members post more than just a naked
>> > vote, but instead endeavor to give some reason why they have assessed
>> > the technical justification as they have. I'll start:
>> >
>> > Despite all of the discussion related to Dongjoon's -1 vote, I must
>> > confess to still not being entirely clear on what is his technical
>> > justification for that veto. I see claims that including an admonition
>> > in the Spark 4.0.x release notes that a prior upgrade to 3.5.5 is
>> > required to maintain the integrity of already existing data streams,
>> > and I see assertions about the maintenance burden that including the
>> > migration logic would impose on future Spark versions, but I don't
>> > think that I see any other technical objections. I do not believe that
>> > the claimed technical justification is valid.
>> >
>> > In requiring that a veto of a code change be accompanied by a
>> > technical justification for the veto, the Apache Voting Process states
>> > that: "To prevent vetoes from being used capriciously, the voter must
>> > provide with the veto a technical justification showing why the change
>> > is bad (opens a security exposure, negatively affects performance,
>> > etc. ). A veto without a justification is invalid and has no weight."
>> > This strongly implies that there must be something objectively wrong
>> > with the proposed code change in that it causes significant harm in
>> > the way of opening a security exposure, negatively affecting
>> > performance, or presumably other significant user harms or perhaps
>> > even developer burdens.
>> >
>> > The proposed addition of the migration logic to Spark 4.0.x does not
>> > cause any harm to Spark's users. For many users, those not using
>> > streaming data, the change will have no effect. For streaming users
>> > the change will be beneficial, not harmful.
>> >
>> > Neither do I find the claim of excessive, ongoing developer burden to
>> > be persuasive. The changes are tiny and easily maintained -- in fact,
>> > it wouldn't surprise me if no further changes to this migration logic
>> > would be needed for a very long time.
>> >
>> > Some of what we are left with is just an expression of preference for
>> > a technical alternative to the migration logic -- i.e. including in
>> > the release notes an admonition to first upgrade to 3.5.5. But the
>> > Apache Voting Process does not say that in the face of code
>> > alternatives A and B, a qualified voter is justified in vetoing A if
>> > they prefer B. Instead, the Voting Process strongly implies that
>> > something more is needed to justify a veto, as I've already covered.
>> > Thus I don't find Dongjoon's preference for the release notes option
>> > to be adequate justification for the veto.
>> >
>> > The only remaining question I see is whether including "databricks" in
>> > the Apache Code is ever allowed or if any such instance must be
>> > expunged as soon as possible. I am not aware of any ASF policy that
>> > strictly forbids the mention of a vendor in Apache code for any
>> > reason, even if that vendor has a product based on Apache code, even
>> > if that vendor enjoys a uniquely influential position vis a vis some
>> > Apache code or project. Certainly the PMC has a duty to see to it that
>> > neither Databricks nor any other vendor exercises influence or control
>> > over Apache Spark outside of the established Apache process, but the
>> > proposed migration code changes do not advantage Databricks -- if
>> > anything they remove a minor avenue of influence, and simply need to
>> > mention "databricks" once in order match and transform a configuration
>> > into a vendor neutral equivalent. While not optimal, I can't find such
>> > a one-time inclusion of "databricks" to be truly offensive to any
>> > non-technical policy concern -- certainly not offensive to the point
>> > that it outweighs the user advantage of including the migration logic
>> > in Spark 4.0.x.
>> >
>> > In summary, I do not find Dongjoon's given technical justification to
>> > be valid relative to the Apache requirements for a veto of a code
>> > change, so I must vote...
>> >
>> > +1
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
[VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` 
configuration in Spark 4.0.x
The vote passes with 7 +1s (3 binding +1s) and 1 -1s (1 binding -1s).
Thanks to all who helped with the vote!

I'm going to make a code change in branch-4.0 quickly so that we don't have
to trigger another RC for Spark 4.0.0 just because of this.

(* = binding)
+1:
- Sean R. Owen *
- Jungtaek Lim
- Nicholas Chammas
- Wenchen Fan *
- Adam Binford
- Russell Jurney
- Yang Jie *

-1:
- Dongjoon Hyun *

Thanks,
Jungtaek Lim (HeartSaVioR)

Mark Hamstra - Friday 14 March 2025 02:42:34 GMT
This vote has not passed.

The proposed code change has been vetoed by a qualified voter. The
validity of that veto has been called into question since "the voter
must provide with the veto a technical justification showing why the
change is bad (opens a security exposure, negatively affects
performance, etc. )." It has been less than 24 hours since Dongjoon's
veto was called into question. He should be given a chance to explain
why there is technical justification for it.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Sean Owen - Friday 14 March 2025 02:57:09 GMT
This has been ongoing for a week, the vote has been open for 3 days,
Dongjoon has replied today (not sure if you saw it), and I think this is
all around in circles; I don't see any basis for waiting 24 hours (? where
is this from?) I don't know if this is a code change vote - there is no
code changing. But if it were, I think everyone's still missing the
technical justification part, so, same result. I think this is definitely
the correct result by spirit and letter of policy.

It's not like we can't all change minds if some new legitimate concern or
angle comes out, but, I'd say it's better not to keep entertaining this
conversation if there is no movement on the substance of the discussion.
There is just clear support for the position in this vote.




Jungtaek Lim - Friday 14 March 2025 02:57:33 GMT
But can you please explain how we can be placed to be a fair situation?
Let's say, assume we have the migration code in the current codebase.
Dongjoon will never be able to remove the code, no?

There are only two choices and I believe the codebase as it is is
"accidentally" following his proposal. I am looking forward to seeing your
resolution on this.


Wenchen Fan - Friday 14 March 2025 03:12:37 GMT
As the release manager for 4.0, I’m awaiting a conclusion on this topic
before cutting the next RC. There have been many debates, and I’m trying to
understand what has happened so far.
1. A mistake was made, leading to a vendor name being included in the
configuration released in Spark 3.5.4.
2. Dongjoon initiated a vote to deprecate the incorrect configuration name
in 3.5.5, and the vote passed. Thanks to Dongjoon, 3.5.5 was released
shortly after.
3. A PR <https://github.com/apache/spark/pull/49897> that simply renamed
(rather than deprecated) the configuration was merged into master/4.0. This
is a breaking change and was not backed by a vote.
4. This vote concerns adding migration logic to prevent the breaking change
from affecting streaming queries.

Personally, I don’t think this vote is necessary, as the standard approach
would be to revert the breaking change if there is disagreement. However,
if we revert it, leaving the misnamed configuration in 4.0 is clearly not
ideal. Following the previously voted decision for 3.5.5 and applying the
same approach to 4.0.0 seems like a reasonable path forward.


Jungtaek Lim - Friday 14 March 2025 03:16:53 GMT
Actually, this has been initially triggered from 3 weeks ago, not just a
week we have spent.
https://github.com/apache/spark/pull/49983#issuecomment-2676531485

Mark, do you still want me to persuade Dongjoon while I clearly saw his
stance on this on the VOTE thread? He can correct me, but from what I
understand, he just wanted to leave the status to "agree to disagree", and
I'm OK with that as long as I'm not blocked.

We have asked about the rationale of being against the proposal, like, what
is the ASF policy he is referring to. I don't hear anything. It's not just
happen in a day or so, and I think he had enough time to discuss it with us
if he wanted to persuade the others, like, influencing the opposite
direction.


Mark Hamstra - Friday 14 March 2025 03:17:19 GMT
There is not clear support for the position in this vote. There is a
clear effort to veto by Dongjoon. He provided some technical
justification with his -1 vote. That justification has been called
into question. We have not yet seen his response to that, and it has
been less than 24 hours.

Frankly, the appearance that Databricks employees are trying to
diminish issues and to ramrod a vote is very much of a piece with the
non-technical concern associated with this issue. When it comes to
Spark, Databricks is unquestionably in a unique position not shared by
any other vendor, so saying that other vendor names appear in the
codebase is at least not dispositive on the issue of whether
"spark.databricks" can be tolerated in the Apache Spark codebase. The
PMC does have a legitimate interest in preventing even the appearance
that Databricks is being given any preferential treatment or control
over the Apache Spark project. That by itself is not a technical
issue, and has to be weighed against user technical interests.

I'm not convinced of Dongjoon's technical objection, but he does
deserve a realistic chance to respond to the asserted rejection of his
veto. That is not too much to ask, nor are we facing a strict timeline
that demands an immediate conclusion to the vote.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Mark Hamstra - Friday 14 March 2025 03:25:50 GMT
Characterizing Dongjoon's position as just "agree to disagree" without
any valid technical issue is your position. I have not seen any
endorsement from him on list that this is a correct characterization
of his position.

I see recent questioning of whether Dongjoon's veto is justified by a
valid technical issue. I see no response yet to that challenge. There
is little to no harm in giving him some more time to respond to the
recent challenge to his veto.


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Jungtaek Lim - Friday 14 March 2025 03:42:52 GMT
I am open to waiting for a day, but please be sure to remember that 3 weeks
have passed and he had plenty of time to persuade people like I did.

Also, I'd like to remind you that I did not attempt "just one time" to get
his voice (yeah, persuade, actually).

This is the post I sent to ask for revisiting the decision.
https://lists.apache.org/thread/v35ld522hgtsrghfzkbk8bhf6sopw1kn

This is what I got.
https://lists.apache.org/thread/ty8svwbp7hqqd325dhd0gohxrpybd2fk

I don't see the feedback to be something that leads to productive
discussion. I feel like discussion is just blocked.

My greatest worry is, we might be in a situation where we have another
cycle of discussion/debate based on his feedback. We have 3 week already
and I think I got users' feedback as well. The people who will be hitting
this are users, not contributors, committers, and PMC members. Even PMC
members need to respect users. That's what the project is for. Likewise
veto, PMC members can't override it.



Mark Hamstra - Friday 14 March 2025 04:37:19 GMT
The relevant time window is since Dongjoon's veto was challenged, not
any other that you choose to assert. It has been less than a day since
that challenge.

Dongjoon presented a prima facie correct veto to the proposal. The
technical justification he gave was challenged or asserted to be
invalid. We should either see his response to the challenge or at
least wait a reasonable time for that response before declaring the
veto invalid.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Jungtaek Lim - Friday 14 March 2025 04:45:52 GMT
I love to hear what is the reasonable time here. If you say 1 week, it
doesn't make sense at all. So what time do you suggest on the deadline?
Will you be fine by the end of this week?

Don't leave the status to be ambiguous. We already spent 3 weeks there. I
don't want to let this be dragged.


Mark Hamstra - Friday 14 March 2025 04:50:30 GMT
Again, we have not spent 3 weeks on the matter at hand: whether
Dongjoon's veto is valid. Please stop asserting irrelevant timeframes
and extraneous issues.

The end of this week appears more than adequate and fair to me.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Jungtaek Lim - Friday 14 March 2025 05:04:04 GMT
And the criteria of justifying -1 must be whether he answered all 4
questions from me.

https://lists.apache.org/thread/kdtto3poz28q4yrqdqk6839y965sfn5c

Where is the evidence that having a vendor name in the codebase is

I believe the last one is the most important one to hear, but I argue we
should say we don't hear about the justification if he doesn't answer any
of them.



Mark Hamstra - Friday 14 March 2025 05:18:18 GMT
It is not Dongjoon's responsibility to clarify ASF policy for you. If
you have ASF policy questions, there are ways to address them through
the PMC, legal and the board, not by making demands on Dongjoon. I
don't presume to speak for the whole PMC as to whether or not having
"spark.databricks" appear in the code is strictly forbidden or not. I
will reassert my opinion that the PMC does have a legitimate interest
in precluding even the appearance that Databricks has any influence or
control over Apache Spark not available to other contributors. I don't
believe that that interest necessarily overrides any Spark user
interests, but it shouldn't be diminished or ignored.

Please stop trying to claim control over the process, and let's
patiently await Dongjoon's clarification of his technical issues with
the proposal -- or his failure to do so.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Jungtaek Lim - Friday 14 March 2025 05:43:37 GMT
It is definitely his responsibility to explain why he casted -1. He
explicitly casted his -1 in DISCUSS and I don't believe I hear anything.
The only thing I heard is "users can upgrade Spark 3.5.5 before upgrading
to Spark 4.0.0". How do you justify that this is a technical objection
while we have heard users' feedback that that is "bad"?

What I heard about "Databricks' mistake" does not back up his proposal.
Everyone makes mistakes, and we are mitigating it. Shouldn't we figure out
the best way for users rather than blaming who made a mistake? Isn't it the
preferred way we do postmortem? I sincerely apologize for being missed on
reviewing, if this was needed to smooth the progress.

The major argument here is, I (and some others) do not see his
justification as "technical objection", so -1 cannot be considered as veto.
You also said, "Valid -1 votes are not restricted to technical objections."
which makes me super confused. I expect the answer to counter that it is
indeed a technical objection. If you really thought it's valid -1 even if
it's not having a technical objection, it is obviously not a veto.

I'll be happy to wait for his answer. Again, I have 4 questions, and I have
to hear everything to be considered that "technical objection" is explained.
(You said question 1 is not valid, I'm fine not to hear about it, but we
will still have an argument "where" the vendor name could be accepted and
where to be disallowed. I said, Apple (not a common noun) is there in the
codebase.)


Jungtaek Lim - Saturday 15 March 2025 01:10:25 GMT
UPDATE:

We were having a discussion about the type of VOTE, since Dongjoon's -1
should be considered as a veto if we see this as a code change VOTE.
Dongjoon clarified that he does not see this VOTE as a code change, hence
he gave -1 but not intended to block the VOTE.

That said, we have confirmed that Dongjoon's -1 is not a veto. I think the
VOTE result is correct as it is. I'll proceed with the next steps.


Mark Hamstra - Saturday 15 March 2025 11:18:54 GMT
Once again, I have to object. Dongjoon said that the vote is a time
limited procedure, not that the vote itself is a procedural vote as
distinct from a code change vote or a package release vote.

Frankly, this feels like you are trying to manipulate the vote
procedure by misrepresenting Dongjoon, and you are quickly losing my
confidence in your ability to administer a fair voting procedure.

I still consider the proposal to be vetoed.


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Jungtaek Lim - Saturday 15 March 2025 12:51:52 GMT

Didn’t you see the part “we agreed”? Who is we in the context?

I don’t think he answered my questions - he explained his reasoning of his
proposal which majorly does not agree with. You even said uou are not
persuaded and I want to ask you now you were persuaded from his last post.

Again I haven’t heard my answers. He showed his reasoning but there is
nothing about the evidence of the validity of “technical” objection. I
think I have asked people who judged his -1 as veto for their reasoning of
how this could be “technical” objection and I don’t think I heard 
anything.

I can be corrected if you can point out what is the “technical” objection.
If you or Dongjoon do not provide this to the end of the week, I have to
consider I haven’t heard about that and the veto (although Dongjoon stated
it is not a veto) will be ignored.

2025년 3월 15일 (토) 오후 8:19, Mark Hamstra <ma...@gmail.com>님이 
작성:


Jungtaek Lim - Saturday 15 March 2025 13:19:47 GMT
To summarize, the main arguments of both proposals are "whether we can
force users to upgrade to Spark 3.5.5 first before upgrading Spark 4.0.0"
vs "we should include migration logic to Spark 4.0.0 because that is not
realistic". Where is the "technical objection" here? If you say there was
politics I can clearly say never, but even if you interpret there was
politics, politics is not "technical objection". I can quote the relevant
ASF page for you.

https://www.apache.org/foundation/voting.html#Veto

tracks. This constitutes a veto, and it cannot be overruled nor overridden
by anyone. Vetoes stand until and unless the individual withdraws their
veto.

with the veto a technical justification showing why the change is bad
(opens a security exposure, negatively affects performance, etc. ). A veto
without a justification is invalid and has no weight.

The justification must be "technical one" for vote. I hope ASF just lists
the most cases rather than leaving this as etc, but I think ASF believes
individual's judgement, and I claim there is no "technical reason". Having
to put 4 more lines is never a technical reason. It is never meant to be
used for blocking different opinions. It must be used for blocking
"incidents which impact users", while we are here to do the opposite,
saving users' life.


Mark Hamstra - Saturday 15 March 2025 18:12:54 GMT
Once again, a vote procedure is not a procedural vote.

This is unarguably a code change vote: the whole point of the vote is
to free you to change the 4.0 branch to include migration logic code
present in 3.5. We are not voting on whether a package is ready to
release, nor are we voting on a procedural matter separable from code
changes and enduring beyond this specific code change proposal.

Your code change proposal has received a -1 vote from a qualified
voter. That vote has been accompanied by a claimed technical
justification. The code change has been vetoed. Full stop.

At this point there are only two ways to get beyond the veto and
proceed with the proposed code change: 1) Dongjoon can withdraw his
veto (which he has not done to my knowledge at this point); 2) The PMC
can decide that the purported technical justification is not valid,
and thus the claimed veto is invalid and has no weight. The validity
of the technical justification has been called into doubt, and ASF
Voting Process provides that "[i]n case of doubt, deciding whether a
technical justification is valid is up to the PMC."

Absent either of those, the proposal remains vetoed.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Mark Hamstra - Saturday 15 March 2025 18:16:13 GMT
You do not have the authority to declare Dongjoon's technical
justification invalid. That is up to the PMC: "In case of doubt,
deciding whether a technical justification is valid is up to the PMC."

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Sean Owen - Saturday 15 March 2025 18:18:05 GMT
Mark et al - this thread has gone on way too long. Everyone has expressed
their opinion. The result stands.
Anyone who is really upset about it, please escalate to the board or
something, but, this thread and decision point has now concluded.



Mark Hamstra - Saturday 15 March 2025 18:24:54 GMT
That is utter nonsense, Sean! You do not have any authority to declare
the matter concluded, and I will escalate to the board if you persist
in this approach.

The proposed code change has been vetoed. As I delineated previously,
there are two and only two ways forward under the ASF Voting Process.
That does not include any individual simply declaring that the matter
has been concluded regardless of the veto and ASF process.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Mich Talebzadeh - Saturday 15 March 2025 20:36:58 GMT
This is my gist

Mark from your passionate language I gather you see this as a "Code Change"
veto. Your reasoning seems to be straightforward, i.e. the vote's purpose
is to decide whether to add code (migration logic) to the Spark 4.0 branch.
In your view, the outcome of the vote directly alters the software's code?

However, If we see it as a *procedural matter only* like some others
include myself, it involves a vote and the interpretation of rules.

In summary

1. If it is a code change vote, a -1 can be seen as a veto, blocking the
change unless specific conditions are met (like the PMC overriding the
veto).
2. If it is just a procedural vote, a -1 might simply be a dissenting
vote, *not necessarily carrying the power to block the entire action.*

FYI, I recall I voted -1 (non binding) on another thread and Dongioon asked
me to explain which it was in his right

I can see the following vote cast (1)

- Jungtaek Lim: +1 (non-binding)
- Sean Owen: +1 to retain
- Yang Jie: -1, later withdraws it and casts +1
- Adam Binford: +1 (non-binding)
- Russell Jurney: +1 non-binding
- Yang Jie: +1
- Mridul Muralidharan: +1
- Dongjoon Hyun: -1


*(1) Summary of Voting from 21 emails in the attached file* from
https://lists.apache.org/thread/nm3p1zjcybdl0p0mc56t2rl92hb9837n :

For Retaining Migration Logic (+1): Jungtaek Lim, Sean Owen, Yang Jie
(initially -1, then +1), Adam Binford (non-binding), Russell Jurney
(non-binding), Mridul Muralidharan
Against Retaining Migration Logic (-1): Dongjoon Hyun

Maybe we should put a bar on it and allow Dongjoon to qualify his statement
as 1 or 2 above, thern it could be escalated if needed or put at rest

HTH





Attachment(s):VOTE_RetainMigrationLogi_cof_incorrectsparkdatabricksConfigurationInSpark4.0.x.txt
 35 KB

Holden Karau - Saturday 15 March 2025 21:32:48 GMT
My $0.02:

I do not believe that this vote has passed. I believe there is a valid
veto. On a personal level from a migration point of view I think Spark 4 is
the perfect time to drop this configuration.

Given the disagreement of if this is a valid veto I think we should pause
until the board has been given time to provide its input.

Previous perceived dismissing of committer and PMC input has resulted in
this being a more sensitive topic in this project than perhaps some others.

How do folks feel about writing up their views for the board sending it
over to them and seeing if they want to provide input?

Twitter: https://twitter.com/holdenkarau
Fight Health Insurance: https://www.fighthealthinsurance.com/
<https://www.fighthealthinsurance.com/?q=hk_email>
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau
Pronouns: she/her



Ángel Álvarez Pascua - Saturday 15 March 2025 21:58:12 GMT
I agree with Holden regarding Spark 4 being the perfect time to drop this
configuration.

El sáb, 15 mar 2025, 22:33, Holden Karau <ho...@gmail.com> escribió:


Mark Hamstra - Saturday 15 March 2025 23:35:19 GMT
I don’t think we quite need to involve the board yet. There is still the
open question of whether the technical justification for the veto is valid,
and it is purely up to the PMC to resolve those doubts.

I don’t understand the apparent reluctance to call for resolution of that
question even though the technical justification (and thus the veto) has
been challenged in comments, so I’ll do it myself momentarily….


"""
Reads the source file containing user comments.
Extracts usernames and their corresponding messages.
Performs sentiment analysis using TextBlob:
Positive Sentiment: Score > 0
Neutral Sentiment: Score = 0
Negative Sentiment: Score < 0
Sorts the results by sentiment score.
Visualizes the data using a bar chart with labels for sentiment scores.

Example

Mich Talebzadeh's sentiment score is 0.10, indicating a slightly positive sentiment overall.

Breakdown:
Negative Sentiment: Expressions of frustration about ongoing issues and lack of resolution.
Neutral Sentiment: Discussion about transparency and communication.
Positive Sentiment: Acknowledgment of efforts being made.
Despite the complaints, the slight appreciation of efforts leads to an overall mildly positive score. 
"""

import pandas as pd
import matplotlib.pyplot as plt
from textblob import TextBlob
import re

# Load the file
file_path = "VOTE_RetainMigrationLogi_cof_incorrectsparkdatabricksConfigurationInSpark4.0.x.txt"

# Read the file
with open(file_path, "r", encoding="utf-8") as file:
    lines = file.readlines()

# Extract user comments
user_comments = {}
current_user = None
for line in lines:
    # Identify usernames based on known format (assuming "User - Date Time" pattern)
    match = re.match(r"^(.*) - \w+\s\d+\s\w+\s\d{4}", line)
    if match:
        current_user = match.group(1).strip()
        if current_user not in user_comments:
            user_comments[current_user] = []
    elif current_user and line.strip():
        user_comments[current_user].append(line.strip())

# Exclude Angel Alvarez and ensure Dongjoon Hyun is included
excluded_users = {"Angel Alvarez"}
sentiment_scores = {}
for user, comments in user_comments.items():
    if user in excluded_users:
        continue  # Skip excluded user
    full_text = " ".join(comments)  # Combine all comments for a user
    sentiment_score = TextBlob(full_text).sentiment.polarity  # Compute sentiment
    sentiment_scores[user] = sentiment_score

# Convert to DataFrame for plotting
df_sentiment = pd.DataFrame(list(sentiment_scores.items()), columns=["User", "Sentiment Score"])
df_sentiment = df_sentiment.sort_values(by="Sentiment Score", ascending=True)  # Sort users

# Ensure Dongjoon Hyun is included (even if no sentiment detected)
if "Dongjoon Hyun" not in df_sentiment["User"].values:
    df_sentiment = pd.concat([df_sentiment, pd.DataFrame([{"User": "Dongjoon Hyun", "Sentiment Score": 0}])], ignore_index=True)

# Plot the sentiment analysis results
plt.figure(figsize=(12, 6))
bars = plt.bar(df_sentiment["User"], df_sentiment["Sentiment Score"], color="blue")

# Add sentiment scores as labels on top of bars
for bar, score in zip(bars, df_sentiment["Sentiment Score"]):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, f"{score:.3f}", 
             ha="center", va="bottom", fontsize=10, color="black", fontweight="bold")

plt.xlabel("Users", fontsize=12)
plt.ylabel("Sentiment Score", fontsize=12)
plt.title("Sentiment Analysis of User Comments (From Source File)", fontsize=14, fontweight="bold")
plt.xticks(rotation=45, ha="right", fontsize=10)
plt.grid(axis="y", linestyle="--", alpha=0.7)
plt.tight_layout()

# Save and show plot
plt.savefig("sentiment_score.png")
#plt.show()


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to