And it should be generic for HashJoin not only broadcast join, right?
Chrysan Wu
吴晓菊
Phone:+86 17717640807
2018-06-29 10:42 GMT+08:00 吴晓菊 :
> Sorry for the mistake. You are right output ordering of broadcast join can
> be the order of big table in some types of join. I will prepare a PR and
>
Sorry for the mistake. You are right output ordering of broadcast join can
be the order of big table in some types of join. I will prepare a PR and
let you review later. Thanks a lot!
Chrysan Wu
吴晓菊
Phone:+86 17717640807
2018-06-29 0:00 GMT+08:00 Wenchen Fan :
> SortMergeJoin sorts its childre
Yep, that's right. There were a bunch of things that were removed from
those scripts that made it tricky to build 2.1 (like Scala 2.10
support). I think it's good to keep the scripts working for older
releases since that allows is to fix things / add features to them
without having to backport to o
If I recall we stop releasing Hadoop 2.3 or 2.4 in newer releases (2.2+?) -
that might be why they are not the release script.
From: Marcelo Vanzin
Sent: Thursday, June 28, 2018 11:12:45 AM
To: Sean Owen
Cc: Marcelo Vanzin; dev
Subject: Re: [VOTE] Spark 2.1.3 (R
Alright, uploaded the missing packages.
I'll send a PR to update the release scripts just in case...
On Thu, Jun 28, 2018 at 10:08 AM, Sean Owen wrote:
> If it's easy enough to produce them, I agree you can just add them to the RC
> dir.
>
> On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin
> wro
+1
Tested on CentOS 7.4 and Oracle JDK 1.8.0_171.
Bests,
Dongjoon.
On Thu, Jun 28, 2018 at 7:24 AM Takeshi Yamamuro
wrote:
> +1
>
> I run tests on a EC2 m4.2xlarge instance;
> [ec2-user]$ java -version
> openjdk version "1.8.0_171"
> OpenJDK Runtime Environment (build 1.8.0_171-b10)
> OpenJDK
If it's easy enough to produce them, I agree you can just add them to the
RC dir.
On Thu, Jun 28, 2018 at 11:56 AM Marcelo Vanzin
wrote:
> I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which
> existed in the previous version:
> https://dist.apache.org/repos/dist/release/spark/
I just noticed this RC is missing builds for hadoop 2.3 and 2.4, which
existed in the previous version:
https://dist.apache.org/repos/dist/release/spark/spark-2.1.2/
How important do we think are those? I think I can just build them and
publish them to the RC directory without having to create a n
+1
On Thu, Jun 28, 2018 at 9:34 AM Xiao Li wrote:
> +1. Thanks, Saisai!
>
> The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP.
>
> Thanks,
>
> Xiao
>
> 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro :
>
>> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
+1. Thanks, Saisai!
The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP.
Thanks,
Xiao
2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro :
> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
>
> On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang
> wrote:
>
>> +1
>>
BTW that would be a great fix in the docs now that we'll have a 2.3.2
being prepared.
On Thu, Jun 28, 2018 at 9:17 AM, Felix Cheung wrote:
> Exactly...
>
>
> From: Marcelo Vanzin
> Sent: Thursday, June 28, 2018 9:16:08 AM
> To: Tom Graves
> Cc: Felix Cheung; dev
Exactly...
From: Marcelo Vanzin
Sent: Thursday, June 28, 2018 9:16:08 AM
To: Tom Graves
Cc: Felix Cheung; dev
Subject: Re: [VOTE] Spark 2.1.3 (RC2)
Yeah, we should be more careful with that in general. Like we state
that "Spark runs on Java 8+"...
On Thu, Jun 28
Yeah, we should be more careful with that in general. Like we state
that "Spark runs on Java 8+"...
On Thu, Jun 28, 2018 at 9:13 AM, Tom Graves wrote:
> Right we say we support R3.1+ but we never actually did, so agree its a bug
> but its not a regression since we never really supported them or t
Right we say we support R3.1+ but we never actually did, so agree its a bug
but its not a regression since we never really supported them or tested with
them and its not a logic or security bug that ends in corruptions or bad
behavior so in my opinion its not a blocker. Again I'm fine with ad
Yap will do
From: Marcelo Vanzin
Sent: Thursday, June 28, 2018 9:04:41 AM
To: Felix Cheung
Cc: Spark dev list
Subject: Re: Time for 2.3.2?
Could you mark that bug as blocker and set the target version, in that case?
On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung
Could you mark that bug as blocker and set the target version, in that case?
On Thu, Jun 28, 2018 at 8:46 AM, Felix Cheung
wrote:
> +1
>
> I’d like to fix SPARK-24535 first though
>
> --
> *From:* Stavros Kontopoulos
> *Sent:* Thursday, June 28, 2018 3:50:34 AM
> *To
SortMergeJoin sorts its children by join key, but broadcast join does not.
I think the output ordering of broadcast join has nothing to do with join
key.
On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido wrote:
> I think the outputOrdering would be the one of the big table (if any) and
> it wouldn't
+1
I’d like to fix SPARK-24535 first though
From: Stavros Kontopoulos
Sent: Thursday, June 28, 2018 3:50:34 AM
To: Marco Gaido
Cc: Takeshi Yamamuro; Xingbo Jiang; Wenchen Fan; Spark dev list; Saisai Shao;
van...@cloudera.com.invalid
Subject: Re: Time for 2.3.2?
I think the outputOrdering would be the one of the big table (if any) and
it wouldn't matter if this involves the join keys or not. Am I wrong?
2018-06-28 17:01 GMT+02:00 吴晓菊 :
> Thanks for the reply.
> By looking into the SortMergeJoinExec, I think we can follow what
> SortMergeJoin do, for some
Not pushing back, but our support message has always been R 3.1+ so it a bit
off to say we don’t support newer releases.
https://spark.apache.org/docs/2.1.2/
But looking back, this was found during 2.1.2 RC2 and didn’t fix (in time) for
2.1.2?
http://apache-spark-developers-list.1001551.n3.nab
Thanks for the reply.
By looking into the SortMergeJoinExec, I think we can follow what
SortMergeJoin do, for some types of join, if the children is ordered on
join keys, we can output the ordered join keys as output ordering.
Chrysan Wu
吴晓菊
Phone:+86 17717640807
2018-06-28 22:53 GMT+08:00 Wenc
SortMergeJoin only reports ordering of the join keys, not the output
ordering of any child.
It seems reasonable to me that broadcast join should respect the output
ordering of the children. Feel free to submit a PR to fix it, thanks!
On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 wrote:
> Why we cannot u
+1
I run tests on a EC2 m4.2xlarge instance;
[ec2-user]$ java -version
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-b10)
OpenJDK 64-Bit Server VM (build 25.171-b10, mixed mode)
On Thu, Jun 28, 2018 at 11:38 AM Wenchen Fan wrote:
> +1
>
> On Thu, Jun 28, 2018 at 10
Why we cannot use the output order of big table?
Chrysan Wu
Phone:+86 17717640807
2018-06-28 21:48 GMT+08:00 Marco Gaido :
> The easy answer to this is that SortMergeJoin ensure an outputOrdering,
> while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you
> don't know which i
If this is just supporting newer versions of R that 2.1 never supported then I
would say its not a blocker. But if you feel its useful enough then I would say
its up to Marcelo if he wants to pull in and spin another rc.
Tom
On Wednesday, June 27, 2018, 8:57:25 PM CDT, Felix Cheung
wrote:
The easy answer to this is that SortMergeJoin ensure an outputOrdering,
while BroadcastHashJoin doesn't, ie. after running a BroadcastHashJoin you
don't know which is going to be the order of the output since nothing
enforces it.
Hope this helps.
Thanks.
Marco
2018-06-28 15:46 GMT+02:00 吴晓菊 :
>
We see SortMergeJoinExec is implemented with
outputPartitioning&outputOrdering while BroadcastHashJoinExec is only
implemented with outputPartitioning. Why is the design?
Chrysan Wu
Phone:+86 17717640807
Spark JIRA:
https://issues.apache.org/jira/projects/SPARK/issues/SPARK-24630
Benefits:
Firstly, users, who are unfamiliar with streaming, can easily use SQL to run
StructStreaming especially when migrating offline tasks to real time
processing tasks.
Secondly, support SQL API in StructStreaming c
+1 makes sense.
On Thu, Jun 28, 2018 at 12:07 PM, Marco Gaido
wrote:
> +1 too, I'd consider also to include SPARK-24208 if we can solve it
> timely...
>
> 2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro :
>
>> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
>>
>> On Thu, Jun 2
+1 too, I'd consider also to include SPARK-24208 if we can solve it
timely...
2018-06-28 8:28 GMT+02:00 Takeshi Yamamuro :
> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
>
> On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang
> wrote:
>
>> +1
>>
>> Wenchen Fan 于2018年6月28日 周四下
30 matches
Mail list logo