Re: How to force sorted merge join to broadcast join

2019-07-28 Thread Rubén Berenguel
Hi, I hope this answers your question. You can hint the broadcast in SQL as detailed here: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-joins-broadcast.html (thanks Jacek :) ) I'd recommend creating a temporary table with the trimming you use in the join (for clarity). Also kee

Logistic Regression Iterations causing High GC in Spark 2.3

2019-07-28 Thread Dhrubajyoti Hati
Hi, We were running Logistic Regression in Spark 2.2.X and then we tried to see how does it do in Spark 2.3.X. Now we are facing an issue while running a Logistic Regression Model in Spark 2.3.X on top of Yarn(GCP-Dataproc). In the TreeAggregate method it takes a huge time due to very High GC Acti

How to force sorted merge join to broadcast join

2019-07-28 Thread zhangliyun
Hi all: i want to ask a question about broadcast join in spark sql. ``` select A.*,B.nsf_cards_ratio * 1.00 / A.nsf_on_entry as nsf_ratio_to_pop from B left join A on trim(A.country) = trim(B.cntry_code); ``` here A is a small table only 8 rows, but somehow the statistics of table A has

Re: Ask for ARM CI for spark

2019-07-28 Thread Tianhua huang
@Sean Owen Thank you very much. And I saw your reply comment in https://issues.apache.org/jira/browse/SPARK-28519, I will test with modification and to see whether there are other similar tests fail, and will address them together in one pull request. On Sat, Jul 27, 2019 at 9:04 PM Sean Owen w