Guaranteed processing orders of each batch in Spark Streaming

2015-10-18 Thread Renjie Liu
Hi, all: I've read source code and it seems that there is no guarantee that the order of processing of each RDD is guaranteed since jobs are just submitted to a thread pool. I believe that this is quite important in streaming since updates should be ordered.

Spark SQL: what does an exclamation mark mean in the plan?

2015-10-18 Thread Xiao Li
Hi, all, After turning on the trace, I saw a strange exclamation mark in the intermediate plans. This happened in catalyst analyzer. Join Inner, Some((col1#0 = col1#6)) Project [col1#0,col2#1,col3#2,col2_alias#24,col3#2 AS col3_alias#13] Project [col1#0,col2#1,col3#2,col2#1 AS col2_alias#24]

Re: PMML export for LinearRegressionModel

2015-10-18 Thread Fazlan Nazeem
Hi Joseph, That's great. Also It would be great if spark extends the PMML support to models which are not PMML supported right now. e.g - Decision Tree - Random Forest - Naive Bayes On Sun, Oct 18, 2015 at 2:55 AM, Joseph Bradley wrote: > Thanks for bringing this up! We need to add

RE: ShuffledHashJoin Possible Issue

2015-10-18 Thread Cheng, Hao
Hi Gsvic, Can you please provide detail code / steps to reproduce that? Hao -Original Message- From: gsvic [mailto:victora...@gmail.com] Sent: Monday, October 19, 2015 3:55 AM To: dev@spark.apache.org Subject: ShuffledHashJoin Possible Issue I am doing some experiments with join algorit

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-18 Thread Robert Dodier
Nicholas, FWIW the --ip option seems to have been deprecated in commit d90d2af1, but that was a pretty big commit, lots of other stuff changed, and there isn't any hint in the log message as to the reason for changing --ip. best, Robert Dodier ---

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-18 Thread Nicholas Chammas
Good catches, Robert. I had actually typed up a draft email a couple of days ago citing those same two blocks of code. I deleted it when I realized like you that the snippets did not explain why IP addresses weren’t working. Something seems wrong here, but I’m not sure what exactly. Maybe this is

ShuffledHashJoin Possible Issue

2015-10-18 Thread gsvic
I am doing some experiments with join algorithms in SparkSQL and I am facing the following issue: I have costructed two "dummy" json tables, t1.json and t2.json. Each of them has two columns, ID and Value. The ID is an incremental integer(unique) and the Value a random value. I am running an equi-

streaming test failure

2015-10-18 Thread Ted Yu
When I ran the following command on Linux with latest master branch: ~/apache-maven-3.3.3/bin/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.4 -Dhadoop.version=2.7.0 package I saw some test failures: http://pastebin.com/1VYZYy5K Has anyone seen similar test failure before ? Thanks

test failed due to OOME

2015-10-18 Thread Ted Yu
From https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=spark-test/3846/console : SparkListenerSuite:- basic creation and shutdown of LiveListenerBus- bus.stop() waits for the event queue to completely drain- basic creation of StageInfo- basic c

Re: Streaming and storing to Google Cloud Storage or S3

2015-10-18 Thread Steve Loughran
> On 18 Oct 2015, at 03:23, vonnagy wrote: > > Has anyone tried to go from streaming directly to GCS or S3 and overcome the > unacceptable performance. It can never keep up. the problem here is that they aren't really filesystems (certainly s3 via the s3n & s3a clients), flush() is a no-op, an

Re: Build spark 1.5.1 branch fails

2015-10-18 Thread Steve Loughran
On 18 Oct 2015, at 11:09, Sean Owen mailto:so...@cloudera.com>> wrote: These are still too low I think. Try 4g heap and 1g permgen. That's what the error tells you right? On Sat, Oct 17, 2015, 10:58 PM Chester Chen mailto:ches...@alpinenow.com>> wrote: Yes, I have tried MAVEN_OPTS with -Xmx

Re: Build spark 1.5.1 branch fails

2015-10-18 Thread Sean Owen
These are still too low I think. Try 4g heap and 1g permgen. That's what the error tells you right? On Sat, Oct 17, 2015, 10:58 PM Chester Chen wrote: > Yes, I have tried MAVEN_OPTS with > > -Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m > > -Xmx4g -XX:MaxPermSize=512M -XX:ReservedCo