is someone else also seeing a hang in DataFrameSubquerySuite.simple uncorrelated scalar subquery - eom?

2025-04-10 Thread Asif Shahid

Re: Requesting advice, thought

2025-03-27 Thread Asif Shahid
r 27, 2025 at 6:53 PM Asif Shahid wrote: > >> Hi Experts, >> Could you please allow me to pick your brain on the following: >> >> For Hive Tables ( managed), the scan operator is FileSourceScanExec. >> Is there any particular reason why its underlying HadoopFSRelat

Requesting advice, thought

2025-03-27 Thread Asif Shahid
Hi Experts, Could you please allow me to pick your brain on the following: For Hive Tables ( managed), the scan operator is FileSourceScanExec. Is there any particular reason why its underlying HadoopFSRelations' field, FileFormat does not implement an interface like SupportsRuntimeFiltering ? Li

Re: Code formatting tech debt

2025-03-15 Thread Asif Shahid
I am not 100% sure, but I think you should run : ./dev/lint-scala . Some time back I also ran the target which you said, and resulted in plethora of files being modified. Then at that time realized, that target was run only for some specific modules ( I think connect..). Regards Asif On Fri

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-21 Thread Asif Shahid
end functional test , the bugrepro,patch and bugTest attached , can be used, but cannot productize it due to nature of the code. The bugreprod.patch with BugTest will pass, if both the above PRs are included. Regards Asif On Sun, Feb 16, 2025 at 9:43 AM Asif Shahid wrote: > Hi. > Ok . d

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-16 Thread Asif Shahid
Hi. Ok . did the final checkin. Pls feel free to review. Regards Asif On Sat, Feb 15, 2025 at 6:42 PM Asif Shahid wrote: > Pls hold on reviewing the patch, as I need to do one more checkin. > I have still left a window of race , by releasing the read lock early, for > the case of f

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-15 Thread Asif Shahid
utor sides..? Then there > might be condition where shuffle files could be lost before > driver/executors are communicated checksum ? > Regards > Asif > > > On Thu, Feb 13, 2025 at 7:39 PM Asif Shahid wrote: > >> The bugrepro patch , when applied on current mas

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-13 Thread Asif Shahid
condition where shuffle files could be lost before driver/executors are communicated checksum ? Regards Asif On Thu, Feb 13, 2025 at 7:39 PM Asif Shahid wrote: > The bugrepro patch , when applied on current master, will show failure > with incorrect results. > While on the PR branch , it

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-13 Thread Asif Shahid
nfirm my understanding of the >> problem, plus educate internal users and platform team: >> https://issues.apache.org/jira/browse/SPARK-38388. Checksum approach was >> brought up in that JIRA too and I feel that is the balanced way to look at >> this problem. >>

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-13 Thread Asif Shahid
The bugrepro patch , when applied on current master, will show failure with incorrect results. While on the PR branch , it will pass. The number of iterations in the test is 100. Regards Asif On Thu, Feb 13, 2025 at 7:35 PM Asif Shahid wrote: > Hi, > Following up on this issue. > The

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-02-12 Thread Asif Shahid
race. Regards Asif On Sun, Jan 26, 2025 at 11:19 PM Asif Shahid wrote: > Shouldn't it be possible to determine with static data , if output will be > deterministic ?. Expressions already have deterministic flag. So when an > attribute is created from alias, it will be possible to kno

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Asif Shahid
er. >- ... > > > On Tue, Jan 28, 2025 at 1:53 PM Asif Shahid wrote: > >> I am genuinely curious to know, as to how do those commits which are >> reliably failing the build, end up in master ? Is there some window of race >> where two conflicting PRs in terms o

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Asif Shahid
I am genuinely curious to know, as to how do those commits which are reliably failing the build, end up in master ? Is there some window of race where two conflicting PRs in terms of logic ,tend to mess up the final state in master ? I have seen in past few months, while synching up my open PRs, f

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Asif Shahid
t; > Maybe we should do it at runtime: if Spark retries a shuffle stage but the > data becomes different (e.g. use checksum to check it), then Spark should > retry all the partitions of this stage. I'll look into this repro after I'm > back from the national holiday. > > On

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Asif Shahid
G, Encoders.STRING)).toDF("pkRight", > "strright") > > > innerDf.write.format("parquet").partitionBy("strright").saveAsTable("inner") > > val innerInnerDf = spark.createDataset( > Seq((1L, "111"), (2L, &qu

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Asif Shahid
personal note.. thanks for your interest.. this is very rare attitude. Regards Asif On Sun, Jan 26, 2025, 9:45 PM Ángel wrote: > Hi Asif, > > Could you provide an example (code+dataset) to analize this? Looks > interesting ... > > > Regards, > Ángel > > El dom, 26 ene

Re: Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-26 Thread Asif Shahid
as that an issue is incorrect. But I think that AttributeRef should have a boolean method which tells, whether the value it represents is from an indeterminate source or not. Regards Asif On Fri, Jan 24, 2025 at 5:18 PM Asif Shahid wrote: > Hi, > While testing a use case where the query

Behaviour of operators like Outer Join when using indeterministic joining keys seems to be full of contradictions

2025-01-24 Thread Asif Shahid
Hi, While testing a use case where the query had an outer join such that joining key of left outer table either had a valid value or a random value( salting to avoid skew). The case was reported to have incorrect results in case of node failure, with retry. On debugging the code, have found followi