Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-26 Thread Dongjoon Hyun
Thank you! Dongjoon On Sat, Jan 25, 2025 at 20:01 Yang Jie wrote: > I reported a test issue that is suspected to be related to this pr: > > - https://github.com/apache/spark/pull/48818/files#r1929652392 > > and it seems to be causing the failure of the Maven daily test. > > Thanks, > Jie Yang >

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-26 Thread Wenchen Fan
Hi all, Thanks for sharing the progress of ongoing projects! Let me summarize them here: - Add Spark Connect config to allow simple switch [PR ] - ML algorithms on Spark Connect (doesn't block 4.0) [JIRA

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-26 Thread Dongjoon Hyun
Thank you, Wenchen, for the summarization and management. Dongjoon. On Sun, Jan 26, 2025 at 9:17 PM Wenchen Fan wrote: > Hi all, > > Thanks for sharing the progress of ongoing projects! Let me summarize them > here: > - Add Spark Connect config to allow simple switch [PR >

Re: [DISCUSS] Ongoing projects for Spark 4.0

2025-01-26 Thread Ángel
Hi, I'd also like to include this other one I opened last summer: https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-49288. Regards, Ángel. El lun, 27 ene 2025, 6:17, Wenchen Fan escribió: > Hi all, > > Thanks for sharing the progress of ongoing projects! Let me summarize them

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Ángel
Hi Asif, Could you provide an example (code+dataset) to analize this? Looks interesting ... Regards, Ángel El dom, 26 ene 2025 a las 20:58, Asif Shahid () escribió: > Hi, > On further thoughts, I concur that leaf expressions like AttributeRefs can > always be considered to be deterministic, a

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Asif Shahid
Shouldn't it be possible to determine with static data , if output will be deterministic ?. Expressions already have deterministic flag. So when an attribute is created from alias, it will be possible to know if attribute is pointing to an inDeterminate component. On Sun, Jan 26, 2025, 11:09 PM We

Re: [Connect] Install additional python packages after session creation

2025-01-26 Thread Hyukjin Kwon
I think there's a separate email thread "Java Client for Spark Connect" by Martin On Mon, 27 Jan 2025 at 15:17, Balaji Sudharsanam V wrote: > Hi Hyukjin Kwon, > > Sorry for bringing in off the topic discussion, > > Is there a Java Client that is similar to PySpark to work with Spark > connect? >

Spark 4.0 vulnerable with hive-metastore-2.3.x.jar versions

2025-01-26 Thread Balaji Sudharsanam V
Hi All, There is a vulnerability with 'High' severity found in the Apache Spark 3.x and 4.0.0 preview (2) releases, with the hive-metastore-2.3.x.jar. This is defined here, Apache Hive security bypass CVE-2021-34538 Vulnerability Report

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Asif Shahid
Sure. I will send prototypical query tomorrow. Though its difficult to simulate issue using unit test , but I think the issue is Rdd.isIndeterminate is not returning true for the query. As a result, on retry, the shuffle stage is not reattempted fully. And rdd is not returning inDeterminate as true

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Asif Shahid
I am using below test. as a unit test though , it will pass, as to simulate executor lost in a single vm is difficult, but there is definitely a bug. Using debugger if you check, the ShuffleStage.isDeterminate is turning out to be true, though it clearly should not be. As result if you look at Da

RE: [Connect] Install additional python packages after session creation

2025-01-26 Thread Balaji Sudharsanam V
Hi Hyukjin Kwon, Sorry for bringing in off the topic discussion, Is there a Java Client that is similar to PySpark to work with Spark connect? Thanks, Balaji From: Hyukjin Kwon Sent: 25 January 2025 04:46 To: Deependra Patel Cc: dev@spark.apache.org Subject: [EXTERNAL] Re: [Connect] Install a

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Wenchen Fan
It looks like a hard problem to statically analyze the query plan and decide whether a Spark stage is deterministic or not. When I added RDD DeterministicLevel, I thought it was not a hard problem for the callers to specify it, but seems I was wrong. Maybe we should do it at runtime: if Spark retr

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Asif Shahid
Hi, On further thoughts, I concur that leaf expressions like AttributeRefs can always be considered to be deterministic, as , as a java variable the value contained in it per iteration is invariant ( except when changed by some deterministic logic). So in that sense what I said in the above mail a