Congrats!
Sent from my iPad
> On Feb 23, 2016, at 2:43 AM, Mohannad Ali wrote:
>
> Hello Everyone,
>
> Thanks a lot for the help. We also managed to solve it but without resorting
> to spark 1.6.
>
> The problem we were having was because of a really bad join condition:
>
> ON ((a.col1 = b.
thanks for sharing the know how guys
Alonso Isidoro Roman.
Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
- Edsger Dijkstra
My favorite quotes (today):
"If debugging is the process of remo
Hello Everyone,
Thanks a lot for the help. We also managed to solve it but without
resorting to spark 1.6.
The problem we were having was because of a really bad join condition:
ON ((a.col1 = b.col1) or (a.col1 is null and b.col1 is null)) AND ((a.col2
= b.col2) or (a.col2 is null and b.col2 is
Good article! Thanks for sharing!
> On Feb 22, 2016, at 11:10 AM, Davies Liu wrote:
>
> This link may help:
> https://forums.databricks.com/questions/6747/how-do-i-get-a-cartesian-product-of-a-huge-dataset.html
>
> Spark 1.6 had improved the CatesianProduct, you should turn of auto
> broadcast
This link may help:
https://forums.databricks.com/questions/6747/how-do-i-get-a-cartesian-product-of-a-huge-dataset.html
Spark 1.6 had improved the CatesianProduct, you should turn of auto
broadcast and go with CatesianProduct in 1.6
On Mon, Feb 22, 2016 at 1:45 AM, Mohannad Ali wrote:
> Hello e
Hello everyone,
I'm working with Tamara and I wanted to give you guys an update on the
issue:
1. Here is the output of .explain():
> Project
> [sk_customer#0L,customer_id#1L,country#2,email#3,birthdate#4,gender#5,fk_created_at_date#6,age_range#7,first_name#8,last_name#9,inserted_at#10L,updated_a
Sorry,
please include the following questions to the list above:
the SPARK version?
whether you are using RDD or DataFrames?
is the code run locally or in SPARK Cluster mode or in AWS EMR?
Regards,
Gourav Sengupta
On Sun, Feb 21, 2016 at 7:37 PM, Gourav Sengupta
wrote:
> Hi Tamara,
>
> few b
Hi Tamara,
few basic questions first.
How many executors are you using?
Is the data getting all cached into the same executor?
How many partitions do you have of the data?
How many fields are you trying to use in the join?
If you need any help in finding answer to these questions please let me
k
Try this setting in your Spark defaults:
spark.sql.autoBroadcastJoinThreshold=-1
I had a similar problem with joins hanging and that resolved it for me.
You might be able to pass that value from the driver as a --conf option, but I
have not tried that, and not sure if that will work.
Sent fr
Please include the output of running explain() when reporting performance
issues with DataFrames.
On Fri, Feb 19, 2016 at 9:31 AM, Tamara Mendt wrote:
> Hi all,
>
> I am running a Spark job that gets stuck attempting to join two
> dataframes. The dataframes are not very large, one is about 2 M r
10 matches
Mail list logo