Re: running tests selectively

2014-04-20 Thread Mark Hamstra
You should add the hub command line wrapper of git for github to that wiki page: https://github.com/github/hub -- doesn't look like I have edit access to the wiki, or I've forgotten a password, or something Once you've got hub installed and aliased, you've got some nice additional options, suc

Re: Java heap space and spark.akka.frameSize Inbox x

2014-04-20 Thread Akhil Das
Hi Chieh, You can increase the heap size by exporting the java options (See below, will increase the heap size to 10Gb) export _JAVA_OPTIONS="-Xmx10g" On Mon, Apr 21, 2014 at 11:43 AM, Chieh-Yen wrote: > Can anybody help me? > Thanks. > > Chieh-Yen > > > On Wed, Apr 16, 2014 at 5:18 PM, Chie

Re: Java heap space and spark.akka.frameSize Inbox x

2014-04-20 Thread Chieh-Yen
Can anybody help me? Thanks. Chieh-Yen On Wed, Apr 16, 2014 at 5:18 PM, Chieh-Yen wrote: > Dear all, > > I developed a application that the message size of communication > is greater than 10 MB sometimes. > For smaller datasets it works fine, but fails for larger datasets. > Please check the er

Re: running tests selectively

2014-04-20 Thread Arun Ramakrishnan
Ah great. thanks. missed the quotes. On Sun, Apr 20, 2014 at 9:01 PM, Patrick Wendell wrote: > I put some notes in this doc: > https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools > > > On Sun, Apr 20, 2014 at 8:58 PM, Arun Ramakrishnan < > sinchronized.a...@gmail.com> wrote

Re: Task splitting among workers

2014-04-20 Thread Patrick Wendell
For a HadoopRDD, first the spark scheduler calculates the number of tasks based on input splits. Usually people use this with HDFS data so in that case it's based on HDFS blocks. If the HDFS datanodes are co-located with the Spark cluster then it will try to run the tasks on the data node that cont

Re: running tests selectively

2014-04-20 Thread Patrick Wendell
I put some notes in this doc: https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools On Sun, Apr 20, 2014 at 8:58 PM, Arun Ramakrishnan < sinchronized.a...@gmail.com> wrote: > I would like to run some of the tests selectively. I am in branch-1.0 > > Tried the following two comm

running tests selectively

2014-04-20 Thread Arun Ramakrishnan
I would like to run some of the tests selectively. I am in branch-1.0 Tried the following two commands. But, it seems to run everything. ./sbt/sbt testOnly org.apache.spark.rdd.RDDSuite ./sbt/sbt test-only org.apache.spark.rdd.RDDSuite Also, how do I run tests of only one of the subprojects.

Re: Are there any plans to develop Graphx Streaming?

2014-04-20 Thread Qi Song
Hi~Ankurdave~ Now I get another question, I realized that GraphX provides four different graph partition methods: RandonVertexCut, CanonicalRandomVertexCut, EdgePartition1D and EdgePartition2D. I've test the running time of these four method using pagerank in several different datasets and found th

Long running time for GraphX pagerank in dataset com-Friendster

2014-04-20 Thread Qi Song
Hello~ I was running some pagerank tests of GraphX in my 8 nodes cluster. I allocated each worker 32G memory and 8 CPU cores. The LiveJournal dataset used 370s, which in my mind is reasonable. But when I tried the com-Friendster data ( http://snap.stanford.edu/data/com-Friendster.html ) with 656083

Re: Hung inserts?

2014-04-20 Thread Rahul Chugh
M ¥ n vc czwqq On Sunday, April 20, 2014, Brad Heller wrote: > Hey list, > > I've got some CSV data I'm importing from S3. I can create the external > table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from > it to pull the data internal to Spark. > > Here's the HQL for my

Hung inserts?

2014-04-20 Thread Brad Heller
Hey list, I've got some CSV data I'm importing from S3. I can create the external table well enough, and I can also do a CREATE TABLE ... AS SELECT ... from it to pull the data internal to Spark. Here's the HQL for my external table: https://gist.github.com/bradhe/11126024 Now I'd like to add pa

evaluate spark

2014-04-20 Thread Joe L
I want to evaluate spark performance by measuring the running time of transformation operations such as map and join. To do so, do I need to materialize merely count action? because As far as I know, transformations are lazy operations and don't do any computation until we action on them but when I

Re: Valid spark streaming use case?

2014-04-20 Thread xargsgrep
Great, this should give me enough to go on. Appreciate the help! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Valid-spark-streaming-use-case-tp4410p4507.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Ooyala Server - plans to merge it into Apache ?

2014-04-20 Thread Andrew Ash
The homepage for Ooyala's job server is here: https://github.com/ooyala/spark-jobserver They decided (I think with input from the Spark team) that it made more sense to keep the jobserver in a separate repository for now. Andrew On Fri, Apr 18, 2014 at 5:42 AM, Azuryy Yu wrote: > Hi, > Good t

Spark recovery from bad nodes

2014-04-20 Thread rama0120
Hi,I am unable to see how Shark (eventually Spark) can recover from a bad node in the cluster. One of my EC2 clusters with 50 nodes ended up with a single node with datanode corruption, I can see the following error when I'm trying to load up a simple file into memory using CTAS: org.apache.hadoop.

Re: Help with error initializing SparkR.

2014-04-20 Thread tongzzz
Problem solved, Shivaram's answer in the github post is the perfect solution for me. See https://github.com/amplab-extras/SparkR-pkg/issues/46# Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-with-error-initializing-SparkR-tp4495p4504.html Sent

Re: Anyone using value classes in RDDs?

2014-04-20 Thread Luis Ángel Vicente Sánchez
Type alias aren't safe as you could use any string as a name or id. On 20 Apr 2014 14:18, "Surendranauth Hiraman" wrote: > If the purpose is only aliasing, rather than adding additional methods and > avoiding runtime allocation, what about type aliases? > > type ID = String > type Name = String >

Re: Anyone using value classes in RDDs?

2014-04-20 Thread Surendranauth Hiraman
Oh, sorry, I think your point was probably you wouldn't need runtime allocation. I guess that is the key question. I would be interested if this works for you. -Suren On Sun, Apr 20, 2014 at 9:18 AM, Surendranauth Hiraman < suren.hira...@velos.io> wrote: > If the purpose is only aliasing, rat

Re: Anyone using value classes in RDDs?

2014-04-20 Thread Surendranauth Hiraman
If the purpose is only aliasing, rather than adding additional methods and avoiding runtime allocation, what about type aliases? type ID = String type Name = String On Sat, Apr 19, 2014 at 9:26 PM, kamatsuoka wrote: > No, you can wrap other types in value classes as well. You can try it in >

question about the SocketReceiver

2014-04-20 Thread YouPeng Yang
Hi I am studing the structure of the Spark Streaming(my spark version is 0.9.0). I have a question about the SocketReceiver.In the onStart function: --- protected def onStart() { logInfo("Connecting to " + host + ":" + port) val sock

Re: Help with error initializing SparkR.

2014-04-20 Thread Shivaram Venkataraman
I just updated the github issue -- In case anybody is curious, this was a problem with R resolving the right java version installed in the VM. Thanks Shivaram On Sat, Apr 19, 2014 at 7:12 PM, tongzzz wrote: > I can't initialize sc context after a successful install on Cloudera > quickstart VM.