date:20180628

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊

And it should be generic for HashJoin not only broadcast join, right? Chrysan Wu 吴晓菊 Phone：+86 17717640807 2018-06-29 10:42 GMT+08:00 吴晓菊 : > Sorry for the mistake. You are right output ordering of broadcast join can > be the order of big table in some types of join. I will prepare a PR and >

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊

Sorry for the mistake. You are right output ordering of broadcast join can be the order of big table in some types of join. I will prepare a PR and let you review later. Thanks a lot! Chrysan Wu 吴晓菊 Phone：+86 17717640807 2018-06-29 0:00 GMT+08:00 Wenchen Fan : > SortMergeJoin sorts its childre

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin

Yep, that's right. There were a bunch of things that were removed from those scripts that made it tricky to build 2.1 (like Scala 2.10 support). I think it's good to keep the scripts working for older releases since that allows is to fix things / add features to them without having to backport to o

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Felix Cheung

If I recall we stop releasing Hadoop 2.3 or 2.4 in newer releases (2.2+?) - that might be why they are not the release script. From: Marcelo Vanzin Sent: Thursday, June 28, 2018 11:12:45 AM To: Sean Owen Cc: Marcelo Vanzin; dev Subject: Re: [VOTE] Spark 2.1.3 (R

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin

Alright, uploaded the missing packages. I'll send a PR to update the release scripts just in case... On Thu, Jun 28, 2018 at 10:08 AM, Sean Owen wrote: > If it's easy enough to produce them, I agree you can just add them to the RC > dir. > > On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin > wro

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-28 Thread Dongjoon Hyun

+1 Tested on CentOS 7.4 and Oracle JDK 1.8.0_171. Bests, Dongjoon. On Thu, Jun 28, 2018 at 7:24 AM Takeshi Yamamuro wrote: > +1 > > I run tests on a EC2 m4.2xlarge instance; > [ec2-user]$ java -version > openjdk version "1.8.0_171" > OpenJDK Runtime Environment (build 1.8.0_171-b10) > OpenJDK

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Sean Owen

If it's easy enough to produce them, I agree you can just add them to the RC dir. On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin wrote: > I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which > existed in the previous version: > https://dist.apache.org/repos/dist/release/spark/

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin

I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which existed in the previous version: https://dist.apache.org/repos/dist/release/spark/spark-2.1.2/ How important do we think are those? I think I can just build them and publish them to the RC directory without having to create a n

Re: Time for 2.3.2?

2018-06-28 Thread Ryan Blue

+1 On Thu, Jun 28, 2018 at 9:34 AM Xiao Li wrote: > +1. Thanks, Saisai! > > The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. > > Thanks, > > Xiao > > 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro : > >> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.

Re: Time for 2.3.2?

2018-06-28 Thread Xiao Li

+1. Thanks, Saisai! The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP. Thanks, Xiao 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro : > +1, I heard some Spark users have skipped v2.3.1 because of these bugs. > > On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang > wrote: > >> +1 >>

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin

BTW that would be a great fix in the docs now that we'll have a 2.3.2 being prepared. On Thu, Jun 28, 2018 at 9:17 AM, Felix Cheung wrote: > Exactly... > > > From: Marcelo Vanzin > Sent: Thursday, June 28, 2018 9:16:08 AM > To: Tom Graves > Cc: Felix Cheung; dev

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Felix Cheung

Exactly... From: Marcelo Vanzin Sent: Thursday, June 28, 2018 9:16:08 AM To: Tom Graves Cc: Felix Cheung; dev Subject: Re: [VOTE] Spark 2.1.3 (RC2) Yeah, we should be more careful with that in general. Like we state that "Spark runs on Java 8+"... On Thu, Jun 28

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Marcelo Vanzin

Yeah, we should be more careful with that in general. Like we state that "Spark runs on Java 8+"... On Thu, Jun 28, 2018 at 9:13 AM, Tom Graves wrote: > Right we say we support R3.1+ but we never actually did, so agree its a bug > but its not a regression since we never really supported them or t

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Tom Graves

Right we say we support R3.1+ but we never actually did, so agree its a bug but its not a regression since we never really supported them or tested with them and its not a logic or security bug that ends in corruptions or bad behavior so in my opinion its not a blocker. Again I'm fine with ad

Re: Time for 2.3.2?

2018-06-28 Thread Felix Cheung

Yap will do From: Marcelo Vanzin Sent: Thursday, June 28, 2018 9:04:41 AM To: Felix Cheung Cc: Spark dev list Subject: Re: Time for 2.3.2? Could you mark that bug as blocker and set the target version, in that case? On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung

Re: Time for 2.3.2?

2018-06-28 Thread Marcelo Vanzin

Could you mark that bug as blocker and set the target version, in that case? On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung wrote: > +1 > > I’d like to fix SPARK-24535 first though > > -- > *From:* Stavros Kontopoulos > *Sent:* Thursday, June 28, 2018 3:50:34 AM > *To

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Wenchen Fan

SortMergeJoin sorts its children by join key, but broadcast join does not. I think the output ordering of broadcast join has nothing to do with join key. On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido wrote: > I think the outputOrdering would be the one of the big table (if any) and > it wouldn't

Re: Time for 2.3.2?

2018-06-28 Thread Felix Cheung

+1 I’d like to fix SPARK-24535 first though From: Stavros Kontopoulos Sent: Thursday, June 28, 2018 3:50:34 AM To: Marco Gaido Cc: Takeshi Yamamuro; Xingbo Jiang; Wenchen Fan; Spark dev list; Saisai Shao; van...@cloudera.com.invalid Subject: Re: Time for 2.3.2?

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido

I think the outputOrdering would be the one of the big table (if any) and it wouldn't matter if this involves the join keys or not. Am I wrong? 2018-06-28 17:01 GMT+02:00 吴晓菊 : > Thanks for the reply. > By looking into the SortMergeJoinExec, I think we can follow what > SortMergeJoin do, for some

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Felix Cheung

Not pushing back, but our support message has always been R 3.1+ so it a bit off to say we don’t support newer releases. https://spark.apache.org/docs/2.1.2/ But looking back, this was found during 2.1.2 RC2 and didn’t fix (in time) for 2.1.2? http://apache-spark-developers-list.1001551.n3.nab

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊

Thanks for the reply. By looking into the SortMergeJoinExec, I think we can follow what SortMergeJoin do, for some types of join, if the children is ordered on join keys, we can output the ordered join keys as output ordering. Chrysan Wu 吴晓菊 Phone：+86 17717640807 2018-06-28 22:53 GMT+08:00 Wenc

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Wenchen Fan

SortMergeJoin only reports ordering of the join keys, not the output ordering of any child. It seems reasonable to me that broadcast join should respect the output ordering of the children. Feel free to submit a PR to fix it, thanks! On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 wrote: > Why we cannot u

Re: [VOTE] Spark 2.2.2 (RC2)

2018-06-28 Thread Takeshi Yamamuro

+1 I run tests on a EC2 m4.2xlarge instance; [ec2-user]$ java -version openjdk version "1.8.0_171" OpenJDK Runtime Environment (build 1.8.0_171-b10) OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode) On Thu, Jun 28, 2018 at 11:38 AM Wenchen Fan wrote: > +1 > > On Thu, Jun 28, 2018 at 10

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊

Why we cannot use the output order of big table? Chrysan Wu Phone：+86 17717640807 2018-06-28 21:48 GMT+08:00 Marco Gaido : > The easy answer to this is that SortMergeJoin ensure an outputOrdering, > while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you > don't know which i

Re: [VOTE] Spark 2.1.3 (RC2)

2018-06-28 Thread Tom Graves

If this is just supporting newer versions of R that 2.1 never supported then I would say its not a blocker. But if you feel its useful enough then I would say its up to Marcelo if he wants to pull in and spin another rc. Tom On Wednesday, June 27, 2018, 8:57:25 PM CDT, Felix Cheung wrote:

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread Marco Gaido

The easy answer to this is that SortMergeJoin ensure an outputOrdering, while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you don't know which is going to be the order of the output since nothing enforces it. Hope this helps. Thanks. Marco 2018-06-28 15:46 GMT+02:00 吴晓菊 : >

why BroadcastHashJoinExec is not implemented with outputOrdering?

2018-06-28 Thread 吴晓菊

We see SortMergeJoinExec is implemented with outputPartitioning&outputOrdering while BroadcastHashJoinExec is only implemented with outputPartitioning. Why is the design? Chrysan Wu Phone：+86 17717640807

Re: Support SqlStreaming in spark

2018-06-28 Thread JackyLee

Spark JIRA: https://issues.apache.org/jira/projects/SPARK/issues/SPARK-24630 Benefits: Firstly, users, who are unfamiliar with streaming, can easily use SQL to run StructStreaming especially when migrating offline tasks to real time processing tasks. Secondly, support SQL API in StructStreaming c

Re: Time for 2.3.2?

2018-06-28 Thread Stavros Kontopoulos

+1 makes sense. On Thu, Jun 28, 2018 at 12:07 PM, Marco Gaido wrote: > +1 too, I'd consider also to include SPARK-24208 if we can solve it > timely... > > 2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro : > >> +1, I heard some Spark users have skipped v2.3.1 because of these bugs. >> >> On Thu, Jun 2

Re: Time for 2.3.2?

2018-06-28 Thread Marco Gaido

+1 too, I'd consider also to include SPARK-24208 if we can solve it timely... 2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro : > +1, I heard some Spark users have skipped v2.3.1 because of these bugs. > > On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang > wrote: > >> +1 >> >> Wenchen Fan 于2018年6月28日周四下

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

Re: [VOTE] Spark 2.1.3 (RC2)

Re: [VOTE] Spark 2.1.3 (RC2)

Re: [VOTE] Spark 2.1.3 (RC2)

Re: [VOTE] Spark 2.2.2 (RC2)

Re: [VOTE] Spark 2.1.3 (RC2)

Re: [VOTE] Spark 2.1.3 (RC2)

Re: Time for 2.3.2?

Re: Time for 2.3.2?

Re: [VOTE] Spark 2.1.3 (RC2)

Re: [VOTE] Spark 2.1.3 (RC2)

Re: [VOTE] Spark 2.1.3 (RC2)

Re: [VOTE] Spark 2.1.3 (RC2)

Re: Time for 2.3.2?

Re: Time for 2.3.2?

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

Re: Time for 2.3.2?

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

Re: [VOTE] Spark 2.1.3 (RC2)

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

Re: [VOTE] Spark 2.2.2 (RC2)

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

Re: [VOTE] Spark 2.1.3 (RC2)

Re: why BroadcastHashJoinExec is not implemented with outputOrdering?

why BroadcastHashJoinExec is not implemented with outputOrdering?

Re: Support SqlStreaming in spark

Re: Time for 2.3.2?

Re: Time for 2.3.2?

30 matches

Site Navigation

Mail list logo

Footer information