Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Michael Armbrust
For outer joins I'd recommend upgrading to master or waiting for a 1.1 release candidate (which should be out this week). On Tue, Aug 5, 2014 at 7:38 AM, Dima Zhiyanov wrote: > I am also experiencing this kryo buffer problem. My join is left outer with > under 40mb on the right side. I would ex

Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Dima Zhiyanov
Yes Sent from my iPhone > On Aug 5, 2014, at 7:38 AM, "Dima Zhiyanov [via Apache Spark User List]" > wrote: > > I am also experiencing this kryo buffer problem. My join is left outer with > under 40mb on the right side. I would expect the broadcast join to succeed > in this case (hive did)

Re: spark sql left join gives KryoException: Buffer overflow

2014-08-05 Thread Dima Zhiyanov
I am also experiencing this kryo buffer problem. My join is left outer with under 40mb on the right side. I would expect the broadcast join to succeed in this case (hive did) Another problem is that the optimizer chose nested loop join for some reason I would expect broadcast (map side) hash join.

Re: spark sql left join gives KryoException: Buffer overflow

2014-07-21 Thread Michael Armbrust
> > When SPARK-2211 is done, will spark sql automatically choose join > algorithms? > Is there some way to manually hint the optimizer? > Ideally we will select the best algorithm for you. We are also considering ways to allow the user to hint.

Re: spark sql left join gives KryoException: Buffer overflow

2014-07-21 Thread Pei-Lun Lee
Hi Michael, Thanks for the suggestion. In my query, both table are too large to use broadcast join. When SPARK-2211 is done, will spark sql automatically choose join algorithms? Is there some way to manually hint the optimizer? 2014-07-19 5:23 GMT+08:00 Michael Armbrust : > Unfortunately, this

Re: spark sql left join gives KryoException: Buffer overflow

2014-07-18 Thread Michael Armbrust
Unfortunately, this is a query where we just don't have an efficiently implementation yet. You might try switching the table order. Here is the JIRA for doing something more efficient: https://issues.apache.org/jira/browse/SPARK-2212 On Fri, Jul 18, 2014 at 7:05 AM, Pei-Lun Lee wrote: > Hi, >