subject:"Re\: Spark Job Hanging on Join"

Re: Spark Job Hanging on Join

2016-02-23 Thread Dave Moyers

Congrats! Sent from my iPad > On Feb 23, 2016, at 2:43 AM, Mohannad Ali wrote: > > Hello Everyone, > > Thanks a lot for the help. We also managed to solve it but without resorting > to spark 1.6. > > The problem we were having was because of a really bad join condition: > > ON ((a.col1 = b.

Re: Spark Job Hanging on Join

2016-02-23 Thread Alonso Isidoro Roman

thanks for sharing the know how guys Alonso Isidoro Roman. Mis citas preferidas (de hoy) : "Si depurar es el proceso de quitar los errores de software, entonces programar debe ser el proceso de introducirlos..." - Edsger Dijkstra My favorite quotes (today): "If debugging is the process of remo

Re: Spark Job Hanging on Join

2016-02-23 Thread Mohannad Ali

Hello Everyone, Thanks a lot for the help. We also managed to solve it but without resorting to spark 1.6. The problem we were having was because of a really bad join condition: ON ((a.col1 = b.col1) or (a.col1 is null and b.col1 is null)) AND ((a.col2 = b.col2) or (a.col2 is null and b.col2 is

Re: Spark Job Hanging on Join

2016-02-22 Thread Dave Moyers

Good article! Thanks for sharing! > On Feb 22, 2016, at 11:10 AM, Davies Liu wrote: > > This link may help: > https://forums.databricks.com/questions/6747/how-do-i-get-a-cartesian-product-of-a-huge-dataset.html > > Spark 1.6 had improved the CatesianProduct, you should turn of auto > broadcast

Re: Spark Job Hanging on Join

2016-02-22 Thread Davies Liu

This link may help: https://forums.databricks.com/questions/6747/how-do-i-get-a-cartesian-product-of-a-huge-dataset.html Spark 1.6 had improved the CatesianProduct, you should turn of auto broadcast and go with CatesianProduct in 1.6 On Mon, Feb 22, 2016 at 1:45 AM, Mohannad Ali wrote: > Hello e

Re: Spark Job Hanging on Join

2016-02-22 Thread Mohannad Ali

Hello everyone, I'm working with Tamara and I wanted to give you guys an update on the issue: 1. Here is the output of .explain(): > Project > [sk_customer#0L,customer_id#1L,country#2,email#3,birthdate#4,gender#5,fk_created_at_date#6,age_range#7,first_name#8,last_name#9,inserted_at#10L,updated_a

Re: Spark Job Hanging on Join

2016-02-21 Thread Gourav Sengupta

Sorry, please include the following questions to the list above: the SPARK version? whether you are using RDD or DataFrames? is the code run locally or in SPARK Cluster mode or in AWS EMR? Regards, Gourav Sengupta On Sun, Feb 21, 2016 at 7:37 PM, Gourav Sengupta wrote: > Hi Tamara, > > few b

Re: Spark Job Hanging on Join

2016-02-21 Thread Gourav Sengupta

Hi Tamara, few basic questions first. How many executors are you using? Is the data getting all cached into the same executor? How many partitions do you have of the data? How many fields are you trying to use in the join? If you need any help in finding answer to these questions please let me k

Re: Spark Job Hanging on Join

2016-02-20 Thread Dave Moyers

Try this setting in your Spark defaults: spark.sql.autoBroadcastJoinThreshold=-1 I had a similar problem with joins hanging and that resolved it for me. You might be able to pass that value from the driver as a --conf option, but I have not tried that, and not sure if that will work. Sent fr

Re: Spark Job Hanging on Join

2016-02-19 Thread Michael Armbrust

Please include the output of running explain() when reporting performance issues with DataFrames. On Fri, Feb 19, 2016 at 9:31 AM, Tamara Mendt wrote: > Hi all, > > I am running a Spark job that gets stuck attempting to join two > dataframes. The dataframes are not very large, one is about 2 M r

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

Re: Spark Job Hanging on Join

10 matches

Site Navigation

Mail list logo

Footer information