Ok guys great discussion all around. it would be* interesting* to observe sentiments by participants in this thread. The source is from spark dev archive for this topic as attached.
The code uses Python with the relevant libraries see the code and imbedded explanation *Approach for Sentiment Analysis* *Data Extraction using Python* Parsed the text file to extract messages per user from the attached file from spark dev archive Processed each message individually. *Sentiment Scoring* Used NLTK's VADER (Valence Aware Dictionary and sEntiment Reasoner) <https://www.sciencebuddies.org/science-fair-projects/project-ideas/ArtificialIntelligence_p023/artificial-intelligence/sentiment_analysis#:~:text=Sentiment%20analysis%20is%20a%20technique%20in%20Natural%20Language,the%20overall%20mood%20is%20positive%2C%20negative%2C%20or%20neutral.> to analyze the sentiment with help from AI Assigned each message a score based on positive, neutral, and negative sentiment expressed *Aggregation:* Computed an average sentiment score per user by taking the mean sentiment across all their messages. The final sentiment score for each user was calculated as: Average Sentiment Score = (Sum of Compound Sentiment Scores) / (Total Messages Sent) [image: sentiment_score.png] Dongjoon sentiment seems to be pretty neutral and the rest mildly positive HTH Dr Mich Talebzadeh, Architect | Data Science | Financial Crime | Forensic Analysis | GDPR view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> On Sun, 16 Mar 2025 at 02:22, Jungtaek Lim <kabhwan.opensou...@gmail.com> wrote: > For sure, I’m +1 (non-binding). > > I believe I don’t need to explain more and I spent whole weekend and we > have a history about my justification in the history of mailing list. > > I’m open to summary my justification again, but as a tl;dr, I have a > strong evidence that he knew we never had a consensus about 4.0 which > destroys his claim for “we agreed to release Spark 4.0.0 as it is”, and > also he said my proposal is technically correct, so he is objecting himself > if he is really casting “veto”. > > Worth noting that his last post is all about technical justification of > “his own proposal”, not about technical objection of my proposal. I’m even > unsure he really intended to cast a vote. I feel like we are overreacting > but I’m happy to make progress of this at least. > > 2025년 3월 16일 (일) 오전 8:44, Mark Hamstra <markhams...@gmail.com>님이 작성: > >> Quick administrative note: I don't see any reason why this vote should >> take a long time, so I expect to close the process and tally the votes >> in not much more than 48 hours. >> >> On Sat, Mar 15, 2025 at 4:35 PM Mark Hamstra <markhams...@gmail.com> >> wrote: >> > >> > There has been enough discussion on this topic already, so I think >> > that an immediate vote on the validity of Dongjoon's technical >> > justification for his veto of the "Retain migration logic ... in Spark >> > 4.0.x" proposal is in order. That technical justification has been >> > called into question, and the guidance at >> > https://www.apache.org/foundation/glossary.html#Veto leaves it to the >> > PMC to determine whether the technical justification is valid: "In >> > case of doubt, deciding whether a technical justification is valid is >> > up to the PMC." As such, only PMC votes will decide the outcome of >> > this vote. This is neither a vote on a code change itself not a vote >> > on whether a package is ready for release, so it a procedural vote on >> > whether the technical justification is valid. As such, the vote will >> > be decided by a simple majority where +1 votes hold that the technical >> > justification is not valid and -1 votes hold that the technical >> > justification is valid. >> > >> > I would request that at least PMC members post more than just a naked >> > vote, but instead endeavor to give some reason why they have assessed >> > the technical justification as they have. I'll start: >> > >> > Despite all of the discussion related to Dongjoon's -1 vote, I must >> > confess to still not being entirely clear on what is his technical >> > justification for that veto. I see claims that including an admonition >> > in the Spark 4.0.x release notes that a prior upgrade to 3.5.5 is >> > required to maintain the integrity of already existing data streams, >> > and I see assertions about the maintenance burden that including the >> > migration logic would impose on future Spark versions, but I don't >> > think that I see any other technical objections. I do not believe that >> > the claimed technical justification is valid. >> > >> > In requiring that a veto of a code change be accompanied by a >> > technical justification for the veto, the Apache Voting Process states >> > that: "To prevent vetoes from being used capriciously, the voter must >> > provide with the veto a technical justification showing why the change >> > is bad (opens a security exposure, negatively affects performance, >> > etc. ). A veto without a justification is invalid and has no weight." >> > This strongly implies that there must be something objectively wrong >> > with the proposed code change in that it causes significant harm in >> > the way of opening a security exposure, negatively affecting >> > performance, or presumably other significant user harms or perhaps >> > even developer burdens. >> > >> > The proposed addition of the migration logic to Spark 4.0.x does not >> > cause any harm to Spark's users. For many users, those not using >> > streaming data, the change will have no effect. For streaming users >> > the change will be beneficial, not harmful. >> > >> > Neither do I find the claim of excessive, ongoing developer burden to >> > be persuasive. The changes are tiny and easily maintained -- in fact, >> > it wouldn't surprise me if no further changes to this migration logic >> > would be needed for a very long time. >> > >> > Some of what we are left with is just an expression of preference for >> > a technical alternative to the migration logic -- i.e. including in >> > the release notes an admonition to first upgrade to 3.5.5. But the >> > Apache Voting Process does not say that in the face of code >> > alternatives A and B, a qualified voter is justified in vetoing A if >> > they prefer B. Instead, the Voting Process strongly implies that >> > something more is needed to justify a veto, as I've already covered. >> > Thus I don't find Dongjoon's preference for the release notes option >> > to be adequate justification for the veto. >> > >> > The only remaining question I see is whether including "databricks" in >> > the Apache Code is ever allowed or if any such instance must be >> > expunged as soon as possible. I am not aware of any ASF policy that >> > strictly forbids the mention of a vendor in Apache code for any >> > reason, even if that vendor has a product based on Apache code, even >> > if that vendor enjoys a uniquely influential position vis a vis some >> > Apache code or project. Certainly the PMC has a duty to see to it that >> > neither Databricks nor any other vendor exercises influence or control >> > over Apache Spark outside of the established Apache process, but the >> > proposed migration code changes do not advantage Databricks -- if >> > anything they remove a minor avenue of influence, and simply need to >> > mention "databricks" once in order match and transform a configuration >> > into a vendor neutral equivalent. While not optimal, I can't find such >> > a one-time inclusion of "databricks" to be truly offensive to any >> > non-technical policy concern -- certainly not offensive to the point >> > that it outweighs the user advantage of including the migration logic >> > in Spark 4.0.x. >> > >> > In summary, I do not find Dongjoon's given technical justification to >> > be valid relative to the Apache requirements for a veto of a code >> > change, so I must vote... >> > >> > +1 >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>
[VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x The vote passes with 7 +1s (3 binding +1s) and 1 -1s (1 binding -1s). Thanks to all who helped with the vote! I'm going to make a code change in branch-4.0 quickly so that we don't have to trigger another RC for Spark 4.0.0 just because of this. (* = binding) +1: - Sean R. Owen * - Jungtaek Lim - Nicholas Chammas - Wenchen Fan * - Adam Binford - Russell Jurney - Yang Jie * -1: - Dongjoon Hyun * Thanks, Jungtaek Lim (HeartSaVioR) Mark Hamstra - Friday 14 March 2025 02:42:34 GMT This vote has not passed. The proposed code change has been vetoed by a qualified voter. The validity of that veto has been called into question since "the voter must provide with the veto a technical justification showing why the change is bad (opens a security exposure, negatively affects performance, etc. )." It has been less than 24 hours since Dongjoon's veto was called into question. He should be given a chance to explain why there is technical justification for it. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Sean Owen - Friday 14 March 2025 02:57:09 GMT This has been ongoing for a week, the vote has been open for 3 days, Dongjoon has replied today (not sure if you saw it), and I think this is all around in circles; I don't see any basis for waiting 24 hours (? where is this from?) I don't know if this is a code change vote - there is no code changing. But if it were, I think everyone's still missing the technical justification part, so, same result. I think this is definitely the correct result by spirit and letter of policy. It's not like we can't all change minds if some new legitimate concern or angle comes out, but, I'd say it's better not to keep entertaining this conversation if there is no movement on the substance of the discussion. There is just clear support for the position in this vote. Jungtaek Lim - Friday 14 March 2025 02:57:33 GMT But can you please explain how we can be placed to be a fair situation? Let's say, assume we have the migration code in the current codebase. Dongjoon will never be able to remove the code, no? There are only two choices and I believe the codebase as it is is "accidentally" following his proposal. I am looking forward to seeing your resolution on this. Wenchen Fan - Friday 14 March 2025 03:12:37 GMT As the release manager for 4.0, Iâm awaiting a conclusion on this topic before cutting the next RC. There have been many debates, and Iâm trying to understand what has happened so far. 1. A mistake was made, leading to a vendor name being included in the configuration released in Spark 3.5.4. 2. Dongjoon initiated a vote to deprecate the incorrect configuration name in 3.5.5, and the vote passed. Thanks to Dongjoon, 3.5.5 was released shortly after. 3. A PR <https://github.com/apache/spark/pull/49897> that simply renamed (rather than deprecated) the configuration was merged into master/4.0. This is a breaking change and was not backed by a vote. 4. This vote concerns adding migration logic to prevent the breaking change from affecting streaming queries. Personally, I donât think this vote is necessary, as the standard approach would be to revert the breaking change if there is disagreement. However, if we revert it, leaving the misnamed configuration in 4.0 is clearly not ideal. Following the previously voted decision for 3.5.5 and applying the same approach to 4.0.0 seems like a reasonable path forward. Jungtaek Lim - Friday 14 March 2025 03:16:53 GMT Actually, this has been initially triggered from 3 weeks ago, not just a week we have spent. https://github.com/apache/spark/pull/49983#issuecomment-2676531485 Mark, do you still want me to persuade Dongjoon while I clearly saw his stance on this on the VOTE thread? He can correct me, but from what I understand, he just wanted to leave the status to "agree to disagree", and I'm OK with that as long as I'm not blocked. We have asked about the rationale of being against the proposal, like, what is the ASF policy he is referring to. I don't hear anything. It's not just happen in a day or so, and I think he had enough time to discuss it with us if he wanted to persuade the others, like, influencing the opposite direction. Mark Hamstra - Friday 14 March 2025 03:17:19 GMT There is not clear support for the position in this vote. There is a clear effort to veto by Dongjoon. He provided some technical justification with his -1 vote. That justification has been called into question. We have not yet seen his response to that, and it has been less than 24 hours. Frankly, the appearance that Databricks employees are trying to diminish issues and to ramrod a vote is very much of a piece with the non-technical concern associated with this issue. When it comes to Spark, Databricks is unquestionably in a unique position not shared by any other vendor, so saying that other vendor names appear in the codebase is at least not dispositive on the issue of whether "spark.databricks" can be tolerated in the Apache Spark codebase. The PMC does have a legitimate interest in preventing even the appearance that Databricks is being given any preferential treatment or control over the Apache Spark project. That by itself is not a technical issue, and has to be weighed against user technical interests. I'm not convinced of Dongjoon's technical objection, but he does deserve a realistic chance to respond to the asserted rejection of his veto. That is not too much to ask, nor are we facing a strict timeline that demands an immediate conclusion to the vote. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Mark Hamstra - Friday 14 March 2025 03:25:50 GMT Characterizing Dongjoon's position as just "agree to disagree" without any valid technical issue is your position. I have not seen any endorsement from him on list that this is a correct characterization of his position. I see recent questioning of whether Dongjoon's veto is justified by a valid technical issue. I see no response yet to that challenge. There is little to no harm in giving him some more time to respond to the recent challenge to his veto. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Jungtaek Lim - Friday 14 March 2025 03:42:52 GMT I am open to waiting for a day, but please be sure to remember that 3 weeks have passed and he had plenty of time to persuade people like I did. Also, I'd like to remind you that I did not attempt "just one time" to get his voice (yeah, persuade, actually). This is the post I sent to ask for revisiting the decision. https://lists.apache.org/thread/v35ld522hgtsrghfzkbk8bhf6sopw1kn This is what I got. https://lists.apache.org/thread/ty8svwbp7hqqd325dhd0gohxrpybd2fk I don't see the feedback to be something that leads to productive discussion. I feel like discussion is just blocked. My greatest worry is, we might be in a situation where we have another cycle of discussion/debate based on his feedback. We have 3 week already and I think I got users' feedback as well. The people who will be hitting this are users, not contributors, committers, and PMC members. Even PMC members need to respect users. That's what the project is for. Likewise veto, PMC members can't override it. Mark Hamstra - Friday 14 March 2025 04:37:19 GMT The relevant time window is since Dongjoon's veto was challenged, not any other that you choose to assert. It has been less than a day since that challenge. Dongjoon presented a prima facie correct veto to the proposal. The technical justification he gave was challenged or asserted to be invalid. We should either see his response to the challenge or at least wait a reasonable time for that response before declaring the veto invalid. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Jungtaek Lim - Friday 14 March 2025 04:45:52 GMT I love to hear what is the reasonable time here. If you say 1 week, it doesn't make sense at all. So what time do you suggest on the deadline? Will you be fine by the end of this week? Don't leave the status to be ambiguous. We already spent 3 weeks there. I don't want to let this be dragged. Mark Hamstra - Friday 14 March 2025 04:50:30 GMT Again, we have not spent 3 weeks on the matter at hand: whether Dongjoon's veto is valid. Please stop asserting irrelevant timeframes and extraneous issues. The end of this week appears more than adequate and fair to me. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Jungtaek Lim - Friday 14 March 2025 05:04:04 GMT And the criteria of justifying -1 must be whether he answered all 4 questions from me. https://lists.apache.org/thread/kdtto3poz28q4yrqdqk6839y965sfn5c Where is the evidence that having a vendor name in the codebase is I believe the last one is the most important one to hear, but I argue we should say we don't hear about the justification if he doesn't answer any of them. Mark Hamstra - Friday 14 March 2025 05:18:18 GMT It is not Dongjoon's responsibility to clarify ASF policy for you. If you have ASF policy questions, there are ways to address them through the PMC, legal and the board, not by making demands on Dongjoon. I don't presume to speak for the whole PMC as to whether or not having "spark.databricks" appear in the code is strictly forbidden or not. I will reassert my opinion that the PMC does have a legitimate interest in precluding even the appearance that Databricks has any influence or control over Apache Spark not available to other contributors. I don't believe that that interest necessarily overrides any Spark user interests, but it shouldn't be diminished or ignored. Please stop trying to claim control over the process, and let's patiently await Dongjoon's clarification of his technical issues with the proposal -- or his failure to do so. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Jungtaek Lim - Friday 14 March 2025 05:43:37 GMT It is definitely his responsibility to explain why he casted -1. He explicitly casted his -1 in DISCUSS and I don't believe I hear anything. The only thing I heard is "users can upgrade Spark 3.5.5 before upgrading to Spark 4.0.0". How do you justify that this is a technical objection while we have heard users' feedback that that is "bad"? What I heard about "Databricks' mistake" does not back up his proposal. Everyone makes mistakes, and we are mitigating it. Shouldn't we figure out the best way for users rather than blaming who made a mistake? Isn't it the preferred way we do postmortem? I sincerely apologize for being missed on reviewing, if this was needed to smooth the progress. The major argument here is, I (and some others) do not see his justification as "technical objection", so -1 cannot be considered as veto. You also said, "Valid -1 votes are not restricted to technical objections." which makes me super confused. I expect the answer to counter that it is indeed a technical objection. If you really thought it's valid -1 even if it's not having a technical objection, it is obviously not a veto. I'll be happy to wait for his answer. Again, I have 4 questions, and I have to hear everything to be considered that "technical objection" is explained. (You said question 1 is not valid, I'm fine not to hear about it, but we will still have an argument "where" the vendor name could be accepted and where to be disallowed. I said, Apple (not a common noun) is there in the codebase.) Jungtaek Lim - Saturday 15 March 2025 01:10:25 GMT UPDATE: We were having a discussion about the type of VOTE, since Dongjoon's -1 should be considered as a veto if we see this as a code change VOTE. Dongjoon clarified that he does not see this VOTE as a code change, hence he gave -1 but not intended to block the VOTE. That said, we have confirmed that Dongjoon's -1 is not a veto. I think the VOTE result is correct as it is. I'll proceed with the next steps. Mark Hamstra - Saturday 15 March 2025 11:18:54 GMT Once again, I have to object. Dongjoon said that the vote is a time limited procedure, not that the vote itself is a procedural vote as distinct from a code change vote or a package release vote. Frankly, this feels like you are trying to manipulate the vote procedure by misrepresenting Dongjoon, and you are quickly losing my confidence in your ability to administer a fair voting procedure. I still consider the proposal to be vetoed. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Jungtaek Lim - Saturday 15 March 2025 12:51:52 GMT Didnât you see the part âwe agreedâ? Who is we in the context? I donât think he answered my questions - he explained his reasoning of his proposal which majorly does not agree with. You even said uou are not persuaded and I want to ask you now you were persuaded from his last post. Again I havenât heard my answers. He showed his reasoning but there is nothing about the evidence of the validity of âtechnicalâ objection. I think I have asked people who judged his -1 as veto for their reasoning of how this could be âtechnicalâ objection and I donât think I heard anything. I can be corrected if you can point out what is the âtechnicalâ objection. If you or Dongjoon do not provide this to the end of the week, I have to consider I havenât heard about that and the veto (although Dongjoon stated it is not a veto) will be ignored. 2025ë 3ì 15ì¼ (í ) ì¤í 8:19, Mark Hamstra <ma...@gmail.com>ëì´ ìì±: Jungtaek Lim - Saturday 15 March 2025 13:19:47 GMT To summarize, the main arguments of both proposals are "whether we can force users to upgrade to Spark 3.5.5 first before upgrading Spark 4.0.0" vs "we should include migration logic to Spark 4.0.0 because that is not realistic". Where is the "technical objection" here? If you say there was politics I can clearly say never, but even if you interpret there was politics, politics is not "technical objection". I can quote the relevant ASF page for you. https://www.apache.org/foundation/voting.html#Veto tracks. This constitutes a veto, and it cannot be overruled nor overridden by anyone. Vetoes stand until and unless the individual withdraws their veto. with the veto a technical justification showing why the change is bad (opens a security exposure, negatively affects performance, etc. ). A veto without a justification is invalid and has no weight. The justification must be "technical one" for vote. I hope ASF just lists the most cases rather than leaving this as etc, but I think ASF believes individual's judgement, and I claim there is no "technical reason". Having to put 4 more lines is never a technical reason. It is never meant to be used for blocking different opinions. It must be used for blocking "incidents which impact users", while we are here to do the opposite, saving users' life. Mark Hamstra - Saturday 15 March 2025 18:12:54 GMT Once again, a vote procedure is not a procedural vote. This is unarguably a code change vote: the whole point of the vote is to free you to change the 4.0 branch to include migration logic code present in 3.5. We are not voting on whether a package is ready to release, nor are we voting on a procedural matter separable from code changes and enduring beyond this specific code change proposal. Your code change proposal has received a -1 vote from a qualified voter. That vote has been accompanied by a claimed technical justification. The code change has been vetoed. Full stop. At this point there are only two ways to get beyond the veto and proceed with the proposed code change: 1) Dongjoon can withdraw his veto (which he has not done to my knowledge at this point); 2) The PMC can decide that the purported technical justification is not valid, and thus the claimed veto is invalid and has no weight. The validity of the technical justification has been called into doubt, and ASF Voting Process provides that "[i]n case of doubt, deciding whether a technical justification is valid is up to the PMC." Absent either of those, the proposal remains vetoed. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Mark Hamstra - Saturday 15 March 2025 18:16:13 GMT You do not have the authority to declare Dongjoon's technical justification invalid. That is up to the PMC: "In case of doubt, deciding whether a technical justification is valid is up to the PMC." --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Sean Owen - Saturday 15 March 2025 18:18:05 GMT Mark et al - this thread has gone on way too long. Everyone has expressed their opinion. The result stands. Anyone who is really upset about it, please escalate to the board or something, but, this thread and decision point has now concluded. Mark Hamstra - Saturday 15 March 2025 18:24:54 GMT That is utter nonsense, Sean! You do not have any authority to declare the matter concluded, and I will escalate to the board if you persist in this approach. The proposed code change has been vetoed. As I delineated previously, there are two and only two ways forward under the ASF Voting Process. That does not include any individual simply declaring that the matter has been concluded regardless of the veto and ASF process. --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org Mich Talebzadeh - Saturday 15 March 2025 20:36:58 GMT This is my gist Mark from your passionate language I gather you see this as a "Code Change" veto. Your reasoning seems to be straightforward, i.e. the vote's purpose is to decide whether to add code (migration logic) to the Spark 4.0 branch. In your view, the outcome of the vote directly alters the software's code? However, If we see it as a *procedural matter only* like some others include myself, it involves a vote and the interpretation of rules. In summary 1. If it is a code change vote, a -1 can be seen as a veto, blocking the change unless specific conditions are met (like the PMC overriding the veto). 2. If it is just a procedural vote, a -1 might simply be a dissenting vote, *not necessarily carrying the power to block the entire action.* FYI, I recall I voted -1 (non binding) on another thread and Dongioon asked me to explain which it was in his right I can see the following vote cast (1) - Jungtaek Lim: +1 (non-binding) - Sean Owen: +1 to retain - Yang Jie: -1, later withdraws it and casts +1 - Adam Binford: +1 (non-binding) - Russell Jurney: +1 non-binding - Yang Jie: +1 - Mridul Muralidharan: +1 - Dongjoon Hyun: -1 *(1) Summary of Voting from 21 emails in the attached file* from https://lists.apache.org/thread/nm3p1zjcybdl0p0mc56t2rl92hb9837n : For Retaining Migration Logic (+1): Jungtaek Lim, Sean Owen, Yang Jie (initially -1, then +1), Adam Binford (non-binding), Russell Jurney (non-binding), Mridul Muralidharan Against Retaining Migration Logic (-1): Dongjoon Hyun Maybe we should put a bar on it and allow Dongjoon to qualify his statement as 1 or 2 above, thern it could be escalated if needed or put at rest HTH Attachment(s):VOTE_RetainMigrationLogi_cof_incorrectsparkdatabricksConfigurationInSpark4.0.x.txt 35 KB Holden Karau - Saturday 15 March 2025 21:32:48 GMT My $0.02: I do not believe that this vote has passed. I believe there is a valid veto. On a personal level from a migration point of view I think Spark 4 is the perfect time to drop this configuration. Given the disagreement of if this is a valid veto I think we should pause until the board has been given time to provide its input. Previous perceived dismissing of committer and PMC input has resulted in this being a more sensitive topic in this project than perhaps some others. How do folks feel about writing up their views for the board sending it over to them and seeing if they want to provide input? Twitter: https://twitter.com/holdenkarau Fight Health Insurance: https://www.fighthealthinsurance.com/ <https://www.fighthealthinsurance.com/?q=hk_email> Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau Pronouns: she/her Ãngel Ãlvarez Pascua - Saturday 15 March 2025 21:58:12 GMT I agree with Holden regarding Spark 4 being the perfect time to drop this configuration. El sáb, 15 mar 2025, 22:33, Holden Karau <ho...@gmail.com> escribió: Mark Hamstra - Saturday 15 March 2025 23:35:19 GMT I donât think we quite need to involve the board yet. There is still the open question of whether the technical justification for the veto is valid, and it is purely up to the PMC to resolve those doubts. I donât understand the apparent reluctance to call for resolution of that question even though the technical justification (and thus the veto) has been challenged in comments, so Iâll do it myself momentarilyâ¦.
""" Reads the source file containing user comments. Extracts usernames and their corresponding messages. Performs sentiment analysis using TextBlob: Positive Sentiment: Score > 0 Neutral Sentiment: Score = 0 Negative Sentiment: Score < 0 Sorts the results by sentiment score. Visualizes the data using a bar chart with labels for sentiment scores. Example Mich Talebzadeh's sentiment score is 0.10, indicating a slightly positive sentiment overall. Breakdown: Negative Sentiment: Expressions of frustration about ongoing issues and lack of resolution. Neutral Sentiment: Discussion about transparency and communication. Positive Sentiment: Acknowledgment of efforts being made. Despite the complaints, the slight appreciation of efforts leads to an overall mildly positive score. """ import pandas as pd import matplotlib.pyplot as plt from textblob import TextBlob import re # Load the file file_path = "VOTE_RetainMigrationLogi_cof_incorrectsparkdatabricksConfigurationInSpark4.0.x.txt" # Read the file with open(file_path, "r", encoding="utf-8") as file: lines = file.readlines() # Extract user comments user_comments = {} current_user = None for line in lines: # Identify usernames based on known format (assuming "User - Date Time" pattern) match = re.match(r"^(.*) - \w+\s\d+\s\w+\s\d{4}", line) if match: current_user = match.group(1).strip() if current_user not in user_comments: user_comments[current_user] = [] elif current_user and line.strip(): user_comments[current_user].append(line.strip()) # Exclude Angel Alvarez and ensure Dongjoon Hyun is included excluded_users = {"Angel Alvarez"} sentiment_scores = {} for user, comments in user_comments.items(): if user in excluded_users: continue # Skip excluded user full_text = " ".join(comments) # Combine all comments for a user sentiment_score = TextBlob(full_text).sentiment.polarity # Compute sentiment sentiment_scores[user] = sentiment_score # Convert to DataFrame for plotting df_sentiment = pd.DataFrame(list(sentiment_scores.items()), columns=["User", "Sentiment Score"]) df_sentiment = df_sentiment.sort_values(by="Sentiment Score", ascending=True) # Sort users # Ensure Dongjoon Hyun is included (even if no sentiment detected) if "Dongjoon Hyun" not in df_sentiment["User"].values: df_sentiment = pd.concat([df_sentiment, pd.DataFrame([{"User": "Dongjoon Hyun", "Sentiment Score": 0}])], ignore_index=True) # Plot the sentiment analysis results plt.figure(figsize=(12, 6)) bars = plt.bar(df_sentiment["User"], df_sentiment["Sentiment Score"], color="blue") # Add sentiment scores as labels on top of bars for bar, score in zip(bars, df_sentiment["Sentiment Score"]): plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, f"{score:.3f}", ha="center", va="bottom", fontsize=10, color="black", fontweight="bold") plt.xlabel("Users", fontsize=12) plt.ylabel("Sentiment Score", fontsize=12) plt.title("Sentiment Analysis of User Comments (From Source File)", fontsize=14, fontweight="bold") plt.xticks(rotation=45, ha="right", fontsize=10) plt.grid(axis="y", linestyle="--", alpha=0.7) plt.tight_layout() # Save and show plot plt.savefig("sentiment_score.png") #plt.show()
--------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org