Re: Apache Spark 3.2.2 Release?

2022-07-07 Thread Andrew Ray
+1 (non-binding) Thanks! On Thu, Jul 7, 2022 at 7:00 AM Yang,Jie(INF) wrote: > +1 (non-binding) Thank you Dongjoon ~ > > > > *发件人**: *Ruifeng Zheng > *日期**: *2022年7月7日 星期四 16:28 > *收件人**: *dev > *主题**: *Re: Apache Spark 3.2.2 Release? > > > > +1 thank you Dongjoon! > > > --

Re: Scala left join with multiple columns Join condition is missing or trivial. Use the CROSS JOIN syntax to allow cartesian products between these relations.

2017-04-03 Thread Andrew Ray
You probably don't want null safe equals (<=>) with a left join. On Mon, Apr 3, 2017 at 5:46 PM gjohnson35 wrote: > The join condition with && is throwing an exception: > > val df = baseDF.join(mccDF, mccDF("medical_claim_id") <=> > baseDF("medical_claim_id") > && mccDF("medical_claim_det

Re: [discuss] ending support for Java 7 in Spark 2.0

2016-03-25 Thread Andrew Ray
+1 on removing Java 7 and Scala 2.10 support. It looks to be entirely possible to support Java 8 containers in a YARN cluster otherwise running Java 7 (example code for alt JAVA_HOME https://issues.apache.org/jira/secure/attachment/12671739/YARN-1964.patch) so really there should be no big problem

Re: SparkSQL - Limit pushdown on BroadcastHashJoin

2016-04-18 Thread Andrew Ray
While you can't automatically push the limit *through* the join, we could push it *into* the join (stop processing after generating 10 records). I believe that is what Rajesh is suggesting. On Tue, Apr 12, 2016 at 7:46 AM, Herman van Hövell tot Westerflier < hvanhov...@questtec.nl> wrote: > I am

Re: HDFS as Shuffle Service

2016-04-28 Thread Andrew Ray
Yes, HDFS has serious problems with creating lots of files. But we can always just create a single merged file on HDFS per task. On Apr 28, 2016 11:17 AM, "Reynold Xin" wrote: Hm while this is an attractive idea in theory, in practice I think you are substantially overestimating HDFS' ability to