Hi, all:
I've read source code and it seems that there is no guarantee that the
order of processing of each RDD is guaranteed since jobs are just submitted
to a thread pool. I believe that this is quite important in streaming
since updates should be ordered.
Hi, all,
After turning on the trace, I saw a strange exclamation mark in
the intermediate plans. This happened in catalyst analyzer.
Join Inner, Some((col1#0 = col1#6))
Project [col1#0,col2#1,col3#2,col2_alias#24,col3#2 AS col3_alias#13]
Project [col1#0,col2#1,col3#2,col2#1 AS col2_alias#24]
Hi Joseph,
That's great. Also It would be great if spark extends the PMML support to
models which are not PMML supported right now.
e.g
- Decision Tree
- Random Forest
- Naive Bayes
On Sun, Oct 18, 2015 at 2:55 AM, Joseph Bradley
wrote:
> Thanks for bringing this up! We need to add
Hi Gsvic, Can you please provide detail code / steps to reproduce that?
Hao
-Original Message-
From: gsvic [mailto:victora...@gmail.com]
Sent: Monday, October 19, 2015 3:55 AM
To: dev@spark.apache.org
Subject: ShuffledHashJoin Possible Issue
I am doing some experiments with join algorit
Nicholas,
FWIW the --ip option seems to have been deprecated in commit d90d2af1,
but that was a pretty big commit, lots of other stuff changed, and there
isn't any hint in the log message as to the reason for changing --ip.
best,
Robert Dodier
---
Good catches, Robert.
I had actually typed up a draft email a couple of days ago citing those
same two blocks of code. I deleted it when I realized like you that the
snippets did not explain why IP addresses weren’t working.
Something seems wrong here, but I’m not sure what exactly. Maybe this is
I am doing some experiments with join algorithms in SparkSQL and I am facing
the following issue:
I have costructed two "dummy" json tables, t1.json and t2.json. Each of them
has two columns, ID and Value. The ID is an incremental integer(unique) and
the Value a random value. I am running an equi-
When I ran the following command on Linux with latest master branch:
~/apache-maven-3.3.3/bin/mvn clean -Phive -Phive-thriftserver -Pyarn
-Phadoop-2.4 -Dhadoop.version=2.7.0 package
I saw some test failures:
http://pastebin.com/1VYZYy5K
Has anyone seen similar test failure before ?
Thanks
From
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/3846/console
:
SparkListenerSuite:- basic creation and shutdown of LiveListenerBus-
bus.stop() waits for the event queue to completely drain- basic
creation of StageInfo- basic c
> On 18 Oct 2015, at 03:23, vonnagy wrote:
>
> Has anyone tried to go from streaming directly to GCS or S3 and overcome the
> unacceptable performance. It can never keep up.
the problem here is that they aren't really filesystems (certainly s3 via the
s3n & s3a clients), flush() is a no-op, an
On 18 Oct 2015, at 11:09, Sean Owen
mailto:so...@cloudera.com>> wrote:
These are still too low I think. Try 4g heap and 1g permgen. That's what the
error tells you right?
On Sat, Oct 17, 2015, 10:58 PM Chester Chen
mailto:ches...@alpinenow.com>> wrote:
Yes, I have tried MAVEN_OPTS with
-Xmx
These are still too low I think. Try 4g heap and 1g permgen. That's what
the error tells you right?
On Sat, Oct 17, 2015, 10:58 PM Chester Chen wrote:
> Yes, I have tried MAVEN_OPTS with
>
> -Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m
>
> -Xmx4g -XX:MaxPermSize=512M -XX:ReservedCo
12 matches
Mail list logo