Re: [DISCUSS?] Adding some empty string check in to_date built-in function + warning in documentation

2024-10-10 Thread Ángel
t runtime errors when > processing big dataset. > > On Thu, Oct 10, 2024 at 11:05 AM Ángel > wrote: > >> Hi, >> >> I opened a Jira ticket back in August, but it seems to have been >> overlooked. While it may not be a critical issue, I would appreciate if you >

Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-04 Thread Ángel
The graphframes library depends on GraphX and has changed recently (3 months ago). https://github.com/graphframes/graphframes/blob/master/src/main/scala/org/graphframes/GraphFrame.scala El vie, 4 oct 2024, 11:35, Nimrod Ofek escribió: > Hi, > > Did anyone do any search about the GraphX API i

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-05 Thread Ángel
inkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* The information provided is correct to the best of my >>> knowledge but of course cannot be gua

Re: [Question] Why driver doesn't shutdown executors gracefully on k8s?

2024-10-10 Thread Ángel
Do you know by any chance if that config also applies to Databricks? El jue, 10 oct 2024 a las 10:02, Ángel () escribió: > Thanks a lot for the clarification. Interesting... I've never needed it, > even though I've been using Spark for over 8 years. > > El jue, 10 oct 20

Re: [Question] Why driver doesn't shutdown executors gracefully on k8s?

2024-10-10 Thread Ángel
gt; https://spark.apache.org/releases/spark-release-3-1-1.html > > The fact that you didn’t see it in the 3.3 site is simply a lack of > documentation. The missing documentation was added in 3.4, thanks to > https://github.com/apache/spark/pull/38131/files > > On Wed, Oc

[DISCUSS?] Adding some empty string check in to_date built-in function + warning in documentation

2024-10-09 Thread Ángel
Hi, I opened a Jira ticket back in August, but it seems to have been overlooked. While it may not be a critical issue, I would appreciate if you could take a moment to consider it before deciding whether to close it. Here is the ticket for reference: SPARK-49288

Re: [Question] Why driver doesn't shutdown executors gracefully on k8s?

2024-10-09 Thread Ángel
Looks like it actually exists ... but only for the Spark Synapse implementation ... https://learn.microsoft.com/en-us/answers/questions/1496283/purpose-of-spark-yarn-executor-decommission-enable Jay Han was asking for some config on k8s, so we shouldn't bring this config to the table, shoul

Re: [DISCUSS] Support spark.ml on Spark Connect

2024-10-09 Thread Ángel
You have my vote (btw, great idea, ML is so sexy nowadays 😉) El jue, 10 oct 2024 a las 3:19, Bobby () escribió: > Hi, > > I'd like to start a discussion about support spark.ml on Connect. With > this feature, Users don't need to change their code to run Spark ML cases > on Connect. > > Please re

Re: [Question] Why driver doesn't shutdown executors gracefully on k8s?

2024-10-09 Thread Ángel
fig came out in Spark 3.4.0: https://archive.apache.org/dist/spark/docs/3.3.4/configuration.html https://archive.apache.org/dist/spark/docs/3.4.0/configuration.html El jue, 10 oct 2024 a las 4:44, Jungtaek Lim () escribió: > Ángel, > > https://spark.apache.org/docs/latest/configuration.html > sear

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-10-04 Thread Ángel
I completely agree with everyone here. I don’t think the issue is deprecating it; to me, the problem lies in not providing a new and better solution for handling graphs in Spark. In the past, I used GraphX via GraphFrames for record linkage, and I found it both useful and effective. Is there any di

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
ut carrying out any impact analysis and in the middle of an active (and interesting, btw) discussion. El mar, 12 nov 2024, 21:59, Russell Jurney escribió: > That is unfortunate. I saw someone volunteer to review my PRs. I thought > there was a holdout? > > On Tue, Nov 12, 2024

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
I guess you missed where Reynold Xin suggested we instead bring > GraphFrames into Spark and others agreed? > > On Tue, Nov 12, 2024 at 12:08 PM Ángel > wrote: > >> You only have to look at the subject of this thread of mails. It says >> nothing about graphframes. I

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > Pronouns: she/her > > > On Tue, Nov 12, 2024 at 6:47 PM Ángel > wrote: > >> I thought that too ... until I read the message from Matei Zaharia: >> >> "Votes to deprecate bo

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
that would have addressed > these? > > Just trying to understand the objection or thinking here > > > On Tue, Nov 12, 2024 at 8:48 PM Ángel > wrote: > >> I thought that too ... until I read the message from Matei Zaharia: >> >> "Votes to deprecate both Sp

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-13 Thread Ángel
Btw, Sean, you work for Databricks ... deprecating GraphX would mean ... Databricks won't give support to this API anymore? for all versions supported or only for the new ones? I'm just curious about that. El mié, 13 nov 2024 a las 16:16, Ángel () escribió: > If by "at length&q

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-13 Thread Ángel
#x27;s somehow widely used). > They may do it and the work will be available forever for users. > I don't see any new ground here. > > > On Tue, Nov 12, 2024 at 10:47 PM Ángel > wrote: > >> When you deprecate something, the message you're sending out is: "This &

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
he hackathon. Need to figure out a time > that works for the people who've already expressed interest. > > Thanks, > Russell Jurney > > > > > On Tue, Nov 12, 2024 at 6:48 AM Ángel > wrote: > >> But the goal wasn't to fix bugs in GraphX? What has that to

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-13 Thread Ángel
ve no volunteers (especially no one familiar with the graph X >>>>>>> code). >>>>>>> >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> Fight Health Insurance: https://www.fighthealthinsurance.

Re: [DISCUSS] Deprecate GraphX OR Find new maintainers interested in GraphX OR leave it as is?

2024-11-12 Thread Ángel
icket. I'm going to work on the project a bit >> and then name a date and time. >> >> https://github.com/graphframes/graphframes/issues/460 >> >> On Tue, Oct 15, 2024 at 7:48 PM Ángel >> wrote: >> >>> We could create a prioritized list of the mo

Re: [VOTE] Officialy Deprecate GraphX in Spark 4

2024-10-03 Thread Ángel
-1 Don’t deprecate GraphX because may be useful for some people and ... would there be any replacement for that API? Anyway, I don't think deprecating an API only because it hasn't been updated in ages is a good practice (but I could be perfectly wrong). El jue, 3 oct 2024, 16:31, Wenchen Fan esc

Re: [VOTE] Single-pass Analyzer for Catalyst

2024-10-03 Thread Ángel
+1 El jue, 3 oct 2024, 20:06, Wenchen Fan escribió: > +1 > > On Wed, Oct 2, 2024 at 7:50 AM Peter Toth wrote: > >> +1 >> >> On Tue, Oct 1, 2024, 08:33 Yang Jie wrote: >> >>> +1, Thanks >>> >>> Jie Yang >>> >>> On 2024/10/01 03:26:40 John Zhuge wrote: >>> > +1 (non-binding) >>> > >>> > On Mon,

Re: Spark Docker image with added packages

2024-10-17 Thread Ángel
Creating a custom classloader to load classes from those jars? El jue, 17 oct 2024, 19:47, Nimrod Ofek escribió: > > Hi, > > Thanks all for the replies. > > I am adding the Spark dev list as well - as I think this might be an issue > that needs to be addressed. > > The options presented here wil

Re: ASF board report draft for February 2025

2025-02-06 Thread Ángel
things and, despite fixing it ... the root issue with performance and OOM still persisted. PS: Some nodes got stringified thousands of times. I was ... totally in shock nobody had noticed it before. El jue, 6 feb 2025 a las 8:55, Ángel () escribió: > If I'm not wrong, the events were still

Re: ASF board report draft for February 2025

2025-02-05 Thread Ángel
is expensive, and we shouldn't do it for every AQE plan change. > Maybe we should do it only once to report the final plan for AQE? Let's > continue the discussion on the PR. > > On Thu, Feb 6, 2025 at 1:48 PM Ángel > wrote: > >> I'd like to add that Spark

Re: ASF board report draft for February 2025

2025-02-06 Thread Ángel
nsic Analysis | GDPR > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > On Thu, 6 Feb 2025 at 08:05, Ángel wrote: > >> Btw, while analyzing this issue, I've also noticed that exactly the same >>

Re: ASF board report draft for February 2025

2025-02-05 Thread Ángel
I'd like to add that Spark is not as fast as it should be, primarily due to its internal verbosity, as reported in ticket *SPARK-50992 *. After submitting this PR , I received some comments, which I quic

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

2024-12-18 Thread Ángel
f Spark users with better classpath >> isolation, better upgrade behavior and better application integration. The >> goal is to optimize for the new users and workloads that will come over >> time while allowing all existing workloads to run by setting exactly one >> spark c

Re: [外部邮件] Re: Spark Connect the default API in Spark 4.0

2024-12-13 Thread Ángel
-1 El sáb, 14 dic 2024 a las 1:36, Dongjoon Hyun () escribió: > For the RDD part, I also disagree with Martin. > I believe RDD should be supported permanently as the public API. > Otherwise, it would be a surprise to me and my colleagues at least. > > > I would assume that we all agree that > >

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-19 Thread Ángel
The higher the level of abstraction, the less control and insight you typically have into its internal workings. If the goal is to create users rather than developers, Spark Connect is the right API to achieve that purpose. El dom, 19 ene 2025, 13:10, Mich Talebzadeh escribió: > I believe by act

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-22 Thread Ángel
e. Regards, Ángel El mié, 22 ene 2025 a las 23:17, Mich Talebzadeh () escribió: > Interesting points: client server architecture has been around since the > days of Sybase. A client written in any language, say Python, Scala makes a > request to spark cluster. This remote access mod

Re: Re: Increasing Shading & Relocating for 4.0

2025-01-18 Thread Ángel
What about introducing isolated class loaders, similar to the approach used by web servers? Perhaps OSGi bundles or something similar? El sáb, 18 ene 2025, 22:43, Holden Karau escribió: > I would say the short answer is "mostly not" and the longer answer is that > the connect APIs are explicitly

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-26 Thread Ángel
Hi, I'd also like to include this other one I opened last summer: https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-49288. Regards, Ángel. El lun, 27 ene 2025, 6:17, Wenchen Fan escribió: > Hi all, > > Thanks for sharing the progress of ongoing projects! Le

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Ángel
Hi Asif, Could you provide an example (code+dataset) to analize this? Looks interesting ... Regards, Ángel El dom, 26 ene 2025 a las 20:58, Asif Shahid () escribió: > Hi, > On further thoughts, I concur that leaf expressions like AttributeRefs can > always be considered to be dete

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Ángel
I'm passionate about and have lots of experience fixing OOMs. Contact me if you need some help. El mié, 22 ene 2025, 1:10, Hyukjin Kwon escribió: > Just a quick note on that: the major reason is 1. OOM we should figure out > and fix the CI environment. 2. structured streaming test failure that i

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-25 Thread Ángel
umulation in the heap. To address this, I suggest introducing a new "off" mode that completely disables plan string generation. Unless I'm mistaken, this mode should ideally be the default configuration, replacing the current verbose "formatted" mode. El jue, 23 ene

Re: [VOTE] Release Apache Spark 3.5.5 deprecating `spark.databricks.*` configuration

2025-02-18 Thread Ángel
+1 (non-binding) El mié, 19 feb 2025, 7:43, Wenchen Fan escribió: > +1 > > On Wed, Feb 19, 2025 at 2:36 PM Sakthi wrote: > >> +1 (non-binding) >> >> On Tue, Feb 18, 2025 at 10:21 PM Yang Jie wrote: >> >>> +1 >>> >>> On 2025/02/19 05:57:53 Mark Hamstra wrote: >>> > +1 >>> > >>> > On Tue, Feb 18

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-12 Thread Ángel
Not an easy task, I guess, but I'm totally for it too. The issue SPARK-49910 is related to this. El mar, 11 mar 2025 a las 23:06, Mich Talebzadeh () escribió: > Yes I am all for it, as I use Hive with Oracle as its metastore > extensively. >

Re: [VOTE] SPIP: Add the TIME data type

2025-02-23 Thread Ángel
+ 1 (non-binding) El dom, 23 feb 2025, 16:51, Max Gekk escribió: > Hi Spark devs, > > Following the discussion [1], I'd like to start the vote for the SPIP [2]. > The SPIP aims to add a new data type TIME to Spark SQL types. New type > should conform to TIME(n) WITHOUT TIME ZONE as defined by th

Re: [VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-15 Thread Ángel Álvarez Pascua
I agree with Holden regarding Spark 4 being the perfect time to drop this configuration. El sáb, 15 mar 2025, 22:33, Holden Karau escribió: > My $0.02: > > I do not believe that this vote has passed. I believe there is a valid > veto. On a personal level from a migration point of view I think Sp

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-22 Thread Ángel Álvarez Pascua
n't happen again? I'm planning to write an article about Spark Connect, so ... thanks again for providing the project to start playing around. El dom, 23 mar 2025, 2:01, Bobby escribió: > Thx @Ángel , I had a PR > https://github.com/apache/spark/pull/50334 to fix it. Ple

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Ángel Álvarez Pascua
Could you take three thread dumps from one of the executors while Spark is performing the conversion? You can use the Spark UI for that. El dom, 23 mar 2025 a las 3:20, Ángel Álvarez Pascua (< angel.alvarez.pas...@gmail.com>) escribió: > Without the data, it's difficult to anal

[SPARK-51264][SQL][JDBC] JDBC write keeps driver connection open unnecessarily

2025-03-20 Thread Ángel Álvarez Pascua
Hi, I've created a new PR for a minor update. Could someone review it? Thanks in advance!

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-21 Thread Ángel Álvarez Pascua
-1 (non-binding): Breaks the Chain of Responsibility. Constraints should be defined and enforced by the data sources themselves, not Spark. Spark is a processing engine, and enforcing constraints at this level blurs architectural boundaries, making Spark responsible for something it does not contro

Re: [VOTE] Release Spark 4.0.0 (RC3)

2025-03-22 Thread Ángel Álvarez Pascua
Is anyone looking into this issue? If not, I'd like to try fixing it. I've never tried out Spark Connect, so... 2x1! (way better than spending the weekend binge-watching on Netflix 😅🤣). @Bobby , thanks a lot, not only for reporting the issue but also for providing a time-saving project for testin

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-24 Thread Ángel Álvarez Pascua
ris > > On Sat, Mar 22, 2025 at 10:30 PM Prem Sahoo wrote: > >> This is inside my current project , I can’t move data to public domain . >> But it seems there is something changed which made this slowness . >> Sent from my iPhone >> >> On Mar 22, 2025, at 10:

Re: [VOTE][RESULT] Single-pass Analyzer for Catalyst

2025-03-18 Thread Ángel Álvarez Pascua
This SPIP looked like a great idea. Does anybody know if there's actually someone working on it? El vie, 4 oct 2024 a las 10:09, Vladimir Golubev () escribió: > Hi folks! > > The vote for 'SPIP: Single-pass Analyzer for Catalyst' passed with 14 +1s > (8 bindings, * = binding): > > +1: > Reynold X

Re: Spark build failed> File line length exceeds 100 characters

2025-04-05 Thread Ángel Álvarez Pascua
like there's a flaw in the PR pipeline. El vie, 21 mar 2025 a las 7:01, Ángel Álvarez Pascua (< angel.alvarez.pas...@gmail.com>) escribió: > Hi, > > > I'm trying to build the project, but I'm encountering multiple errors due > to long lines. Is this expected

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-04-05 Thread Ángel Álvarez Pascua
share the same perspective — if the Spark community > believes it would benefit from having basic geospatial support built in, > the Sedona community would be happy to collaborate on this effort. We’re > open to contributing the necessary functionality and, if appropriate, > having Spa

Spark build failed> File line length exceeds 100 characters

2025-03-27 Thread Ángel Álvarez Pascua
Hi, I'm trying to build the project, but I'm encountering multiple errors due to long lines. Is this expected? I built the project a few weeks ago and don’t recall seeing these errors. Is anyone else experiencing the same issue? [image: image.png] Thanks in advance.

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Ángel Álvarez Pascua
ctors have > the ability to do data validation by themselves, such as file formats that > do not have a backend service. > > On Wed, Mar 26, 2025 at 12:56 AM Gengliang Wang wrote: > >> Hi Ángel, >> >> Thanks for the feedback. Besides the existing NOT NULL constraint, the &

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-25 Thread Ángel Álvarez Pascua
I meant ... a data validation API would be great, but why in the DSv2? isn't data validation something more general? do we have to use DSv2 to have our data validated? El mié, 26 mar 2025, 6:15, Ángel Álvarez Pascua < angel.alvarez.pas...@gmail.com> escribió: > For me, data val

Re: [DISCUSS] SPIP: Declarative Pipelines

2025-04-09 Thread Ángel Álvarez Pascua
+1 (non-binding) El jue, 10 abr 2025, 1:50, Burak Yavuz escribió: > +1 > > On Wed, Apr 9, 2025 at 4:33 PM Szehon Ho wrote: > >> +1 really excited to finally see Materialized View finally make its way >> to Spark, as many other ecosystem projects (Trino, Starrocks, soon Iceberg) >> already suppo

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
What about adding support for WKT <https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry>/ WKB <https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary> ? El vie, 28 mar 2025 a las 20:50, Ángel Álvarez Pascua (< angel.alvarez.p

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
2025, 21:27, Menelaos Karavelas escribió: > In the SPIP Jira the proposal is to add the expressions ST_AsBinary, > ST_GeomFromWKB, and ST_GeogFromWKB. > Is there anything else that you think should be added? > > Regarding WKT, what do you think should be added? > > - Menelaos > &

Re: [VOTE][RESULT] Retain migration logic of incorrect `spark.databricks.*` configuration in Spark 4.0.x

2025-03-15 Thread Ángel Álvarez Pascua
Isn't it a bit excessive to talk about "making the voice of the community very clear" when only a very small fraction of Spark users have participated in the discussion or cast their votes? That said, I read the initial emails and understood the proposal. Since it seemed obvious and straightforwar

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-29 Thread Ángel Álvarez Pascua
t;> Spark itself, which you may be right belongs in pluggable frameworks. >> Menelaus may explain more about the SPIP goal. >> > > >> > > I do hope there can be more collaboration across communities (like in >> Iceberg/Parquet collaboration) in getting Sedona commun

Re: [VOTE] SPIP: Constraints in DSv2

2025-03-22 Thread Ángel Álvarez Pascua
bles performance optimizations. > > - Anton > > пт, 21 бер. 2025 р. о 12:59 Ángel Álvarez Pascua < > angel.alvarez.pas...@gmail.com> пише: > >> -1 (non-binding): Breaks the Chain of Responsibility. Constraints should >> be defined and enforced by the data sour

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Ángel Álvarez Pascua
correct means it has > all the columns available in csv . So we can take out this issue for > slowness . May be there is some other contributing options . > Sent from my iPhone > > On Mar 22, 2025, at 10:05 PM, Ángel Álvarez Pascua < > angel.alvarez.pas...@gmail.com> wrote:

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Ángel Álvarez Pascua
ibió: > Hello , > I read the csv file having size of 2.7 gb which is having 100 columns , > when I am converting this to parquet with Spark 3.2 and Hadoop 2.7.6 it > takes 28 secs but in Spark 3.5.2 and Hadoop 3.4.1 it takes 34 secs . This > stat is bad . > Sent from my iPhone

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Ángel Álvarez Pascua
in . > But it seems there is something changed which made this slowness . > Sent from my iPhone > > On Mar 22, 2025, at 10:23 PM, Ángel Álvarez Pascua < > angel.alvarez.pas...@gmail.com> wrote: > >  > Could you take three thread dumps from one of the executors while Spark is

Re: Spark 3.5.2 and Hadoop 3.4.1 slow performance

2025-03-22 Thread Ángel Álvarez Pascua
Sure. I love performance challenges and mysteries! Please, could you provide an example project or the steps to build one? Thanks. El dom, 23 mar 2025, 2:17, Prem Sahoo escribió: > Hello Team, > I was working with Spark 3.2 and Hadoop 2.7.6 and writing to MinIO object > storage . It was slower

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-03-23 Thread Ángel Álvarez Pascua
ch Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > > > On Sun, 23 Mar 2025 at 17:13, Ángel Álvarez Pascua < > angel.alvarez.pas.

Re: [DISCUSS] SPIP: Add geospatial types to Spark

2025-03-28 Thread Ángel Álvarez Pascua
+1 (non-binding) El vie, 28 mar 2025, 18:48, Menelaos Karavelas escribió: > Dear Spark community, > > I would like to propose the addition of new geospatial data types > (GEOMETRY and GEOGRAPHY) which represent geospatial values as recently > added as new logical types in the Parquet specificati

Re: [DISCUSS] Upgrade Hive compile time dependency to 4.0

2025-04-04 Thread Ángel Álvarez Pascua
h-d-5205b2/> > > > > > > On Wed, 12 Mar 2025 at 12:21, Ángel > wrote: > >> Not an easy task, I guess, but I'm totally for it too. >> >> The issue SPARK-49910 <https://issues.apache.org/jira/browse/SPARK-49910> >> is related to th

Re: Question Regarding Spark Dependencies in Scala

2025-06-06 Thread Ángel Álvarez Pascua
But... is it not like that in any other Java/Scala/Python/... app that uses dependencies that also have their own dependencies? If you want to provide a library, maybe you should give the user the option to decide if they want an all-in-one ubber jar with shaded (more difficult to debug) dependenc

Re: [DISCUSS] SPIP: Upgrade Apache Hive to 4.x

2025-06-07 Thread Ángel Álvarez Pascua
I'm also interested in this SPIP. There was someone else also working on this, if I remember correctly. @Mich Talebzadeh , if you need any help with that issue, let me know. El vie, 6 jun 2025, 1:07, Mich Talebzadeh escribió: > i started working on this by upgrading my hadoop to > > Hadoop 3.4

Re: [VOTE] Release Spark 4.1.0-preview1 (RC1)

2025-07-10 Thread Ángel Álvarez Pascua
+1 (non-binding) El jue, 10 jul 2025, 21:07, Jules Damji escribió: > +1 (non-binding) > — > Sent from my iPhone > Pardon the dumb thumb typos :) > > On Jul 10, 2025, at 8:04 AM, Peter Toth wrote: > >  > +1 > > On Thu, Jul 10, 2025 at 9:12 AM Kent Yao wrote: > >> Thank you all for the verifica

Re: Which committers care about Kafka?

2014-12-18 Thread Luis Ángel Vicente Sánchez
But idempotency is not that easy t achieve sometimes. A strong only once semantic through a proper API would be superuseful; but I'm not implying this is easy to achieve. On 18 Dec 2014 21:52, "Cody Koeninger" wrote: > If the downstream store for the output data is idempotent or transactional, >

Rationale behind scala enumerations instead of sealed traits and case objects

2014-06-19 Thread Luis Ángel Vicente Sánchez
While I was trying to execute a job using spark-submit, I discover a scala.MatchError at runtime... a DriverStateChanged.FAILED message was send to an actor, and the match statement used was not taking that value into account. When I inspected that DriverStateChange.scala file I discovered that it