Re: Re: A question about broadcast nest loop join

2019-10-23 Thread Wenchen Fan
> configuration. > > Such a plan is generally generated by using a NOT IN (subquery), if you > are OK with slightly different NULL semantics then you could use NOT > EXISTS(subquery). The latter should perform a lot better. > > On Wed, Oct 23, 2019 at 12:02 PM zhangliyun wrot

Re:Re: A question about broadcast nest loop join

2019-10-23 Thread zhangliyun
generated by using a NOT IN (subquery), if you are OK with slightly different NULL semantics then you could use NOT EXISTS(subquery). The latter should perform a lot better. On Wed, Oct 23, 2019 at 12:02 PM zhangliyun wrote: Hi all: i want to ask a question about broadcast nestloop join?

Re:Re: A question about broadcast nest loop join

2019-10-23 Thread zhangliyun
hen OOM happens. Maybe there is an algorithm to implement left/right join in a distributed environment without broadcast, but currently Spark is only able to deal with it using broadcast. On Wed, Oct 23, 2019 at 6:02 PM zhangliyun wrote: Hi all: i want to ask a question about broadcast nest

Re: A question about broadcast nest loop join

2019-10-23 Thread Wenchen Fan
, but currently Spark is only able to deal with it using broadcast. On Wed, Oct 23, 2019 at 6:02 PM zhangliyun wrote: > Hi all: > i want to ask a question about broadcast nestloop join? from google i > know, that > left outer/semi join and right outer/semi join will use broadcast &g

A question about broadcast nest loop join

2019-10-23 Thread zhangliyun
Hi all: i want to ask a question about broadcast nestloop join? from google i know, that left outer/semi join and right outer/semi join will use broadcast nestloop. and in some cases, when the input data is very small, it is suitable to use. so here how to define the input data very small

Re: about broadcast join of base table in spark sql

2017-07-02 Thread Yong Zhang
t: Friday, June 30, 2017 6:57 AM To: d...@spark.org<mailto:d...@spark.org>; user@spark.apache.org<mailto:user@spark.apache.org>; paleyl Subject: Re: about broadcast join of base table in spark sql Hello. If you want to allow broadcast join with larger broadcasts you can set spark

Re: about broadcast join of base table in spark sql

2017-07-02 Thread paleyl
t; spark.apache.org >> Computes the numeric value of the first character of the string column, >> and returns the result as a int column. >> >> >> >> >> -- >> *From:* Bryan Jeffrey >> *Sent:* Friday, June 30, 2017 6:57

Re: about broadcast join of base table in spark sql

2017-07-01 Thread Xiaoye Sun
the string column, > and returns the result as a int column. > > > > > -- > *From:* Bryan Jeffrey > *Sent:* Friday, June 30, 2017 6:57 AM > *To:* d...@spark.org; user@spark.apache.org; paleyl > *Subject:* Re: about broadcast join of base table in sp

Re: about broadcast join of base table in spark sql

2017-07-01 Thread Paley Louie
; returns the result as a int column. > > > > > From: Bryan Jeffrey > Sent: Friday, June 30, 2017 6:57 AM > To: d...@spark.org; user@spark.apache.org; paleyl > Subject: Re: about broadcast join of base table in spark sql > > Hello. > > If you want to a

Re: about broadcast join of base table in spark sql

2017-07-01 Thread Paley Louie
> > From: paleyl > Sent: Wednesday, June 28, 10:42 PM > Subject: about broadcast join of base table in spark sql > To: d...@spark.org, user@spark.apache.org > > > Hi All, > > > Recently I meet a problem in broadcast join: I want to left join table A and >

Re: about broadcast join of base table in spark sql

2017-06-30 Thread Yong Zhang
Friday, June 30, 2017 6:57 AM To: d...@spark.org; user@spark.apache.org; paleyl Subject: Re: about broadcast join of base table in spark sql Hello. If you want to allow broadcast join with larger broadcasts you can set spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the p

Re: about broadcast join of base table in spark sql

2017-06-30 Thread Bryan Jeffrey
eyl Sent: Wednesday, June 28, 10:42 PM Subject: about broadcast join of base table in spark sql To: d...@spark.org, user@spark.apache.org Hi All, Recently I meet a problem in broadcast join: I want to left join table A and B, A is the smaller one and the left table, so I wrote  A =

Fwd: about broadcast join of base table in spark sql

2017-06-29 Thread paleyl
Hi All, Recently I meet a problem in broadcast join: I want to left join table A and B, A is the smaller one and the left table, so I wrote A = A.join(B,A("key1") === B("key2"),"left") but I found that A is not broadcast out, as the shuffle size is still very large. I guess this is a designed mech

about broadcast join of base table in spark sql

2017-06-28 Thread paleyl
Hi All, Recently I meet a problem in broadcast join: I want to left join table A and B, A is the smaller one and the left table, so I wrote A = A.join(B,A("key1") === B("key2"),"left") but I found that A is not broadcast out, as the shuffle size is still very large. I guess this is a designed mech

question about Broadcast value NullPointerException

2016-08-23 Thread Chong Zhang
Hello, I'm using Spark streaming to process kafka message, and wants to use a prop file as the input and broadcast the properties: val props = new Properties() props.load(new FileInputStream(args(0))) val sc = initSparkContext() val propsBC = sc.broadcast(props) println(s"propFileBC 1: " + propsB

Re: problem about broadcast variable in iteration

2014-05-29 Thread randylu
hi, Andrew Ash, thanks for your reply. In fact, I have already used unpersist(), but it doesn't take effect. One reason that I select 1.0.0 version is just for it providing unpersist() interface. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-

Re: problem about broadcast variable in iteration

2014-05-25 Thread Andrew Ash
st(kv) // broadcast kv > 12rdd2 = rdd1 > 13 } > 14 rdd2.saveAsTextFile() > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5497.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Re: problem about broadcast variable in iteration

2014-05-15 Thread randylu
} 14 rdd2.saveAsTextFile() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5497.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: problem about broadcast variable in iteration

2014-05-15 Thread randylu
ar tmp = rdd1.reduceByKey().collect() 10kv = updateKV(tmp) // update kv for each iteration 11rdd2 = rdd1 12 } 13 rdd2.saveAsTextFile() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-varia

Re: problem about broadcast variable in iteration

2014-05-15 Thread Earthson
RDD is not cached? Because recomputing may be required, every broadcast object is included in the dependences of RDDs, this may also have memory issue(when n and kv is too large in your case). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about

problem about broadcast variable in iteration

2014-05-15 Thread randylu
t just reading broadcast_/n/. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: problem about broadcast variable in iteration

2014-05-10 Thread randylu
i run in spark 1.0.0, the newest under-development version. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5480.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: about broadcast

2014-05-06 Thread randylu
i found that the small broadcast variable always took about 10s, not 5s or else. If there is some property/conf(which is default 10) that control the timeout? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/about-broadcast-tp5416p5439.html Sent from the

Re: about broadcast

2014-05-05 Thread randylu
additional, Reading the big broadcast variable always took about 2s. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/about-broadcast-tp5416p5417.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

about broadcast

2014-05-05 Thread randylu
.nabble.com/about-broadcast-tp5416.html Sent from the Apache Spark User List mailing list archive at Nabble.com.