Re: error in streaming word count API?

2014-03-02 Thread Aaron Kimball
Filed SPARK-1173 and sent a pull request. As an aside, I think this should probably have been in the STREAMING project on the JIRA, but JIRA seemed adamant that it only allow me to create new issues in the SPARK project. Not sure if that's a JIRA permissions thing, or me losing a fight with Atlassi

Re: error in streaming word count API?

2014-03-02 Thread Aaron Kimball
Running `nc -lk 1234` in one terminal, and running `nc localhost 1234` in another, it demonstrates line-buffered behavior. It's a mystery! Thanks for the link on implicit conversions. The example makes sense. Makes the code easier to trace too. I'll send a JIRA + pull req to touch up the docs.

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-02 Thread polkosity
Thanks for the advice Mayur. I thought I'd report back on the performance difference... Spark standalone mode has executors processing at capacity in under a second :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-

Re: Help with groupByKey

2014-03-02 Thread Cheng Lian
Actually it should be rdd.reduceByKey(_ ++ _) On Mar 3, 2014, at 11:56, Andrew Ash wrote: > rdd.reduceByKey(_+_) using list concatenation? > > Sent from my mobile phone > > On Mar 2, 2014 7:05 PM, "David Thomas" wrote: > I have an RDD of (K, Array[V]) pairs. > > For example: ((key1, (1,2,3))

Re: Help with groupByKey

2014-03-02 Thread Cheng Lian
Actually it should be rdd.reduceByKey(_ ++ _) On Mar 3, 2014, at 11:56, Andrew Ash wrote: > rdd.reduceByKey(_+_) using list concatenation? > > Sent from my mobile phone > > On Mar 2, 2014 7:05 PM, "David Thomas" wrote: > I have an RDD of (K, Array[V]) pairs. > > For example: ((key1, (1,2,3))

Re: Help with groupByKey

2014-03-02 Thread Andrew Ash
rdd.reduceByKey(_+_) using list concatenation? Sent from my mobile phone On Mar 2, 2014 7:05 PM, "David Thomas" wrote: > I have an RDD of (K, Array[V]) pairs. > > For example: ((key1, (1,2,3)), (key2, (3,2,4)), (key1, (4,3,2))) > > How can I do a groupByKey such that I get back an RDD of the for

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Aditya Varun Chadha
Or Titan on hbase, so you could try reading graphs directly via custom io formats

Help with groupByKey

2014-03-02 Thread David Thomas
I have an RDD of (K, Array[V]) pairs. For example: ((key1, (1,2,3)), (key2, (3,2,4)), (key1, (4,3,2))) How can I do a groupByKey such that I get back an RDD of the form (K, Array[V]) pairs. Ex: ((key1, (1,2,3,4,3,2)), (key2, (3,2,4)))

Re: flatten RDD[RDD[T]]

2014-03-02 Thread Josh Rosen
Nope, nested RDDs aren't supported: https://groups.google.com/d/msg/spark-users/_Efj40upvx4/DbHCixW7W7kJ https://groups.google.com/d/msg/spark-users/KC1UJEmUeg8/N_qkTJ3nnxMJ https://groups.google.com/d/msg/spark-users/rkVPXAiCiBk/CORV5jyeZpAJ On Sun, Mar 2, 2014 at 5:37 PM, Cosmin Radoi wrote:

flatten RDD[RDD[T]]

2014-03-02 Thread Cosmin Radoi
I'm trying to flatten an RDD of RDDs. The straightforward approach: a: [RDD[RDD[Int]] a flatMap { _.collect } throws a java.lang.NullPointerException at org.apache.spark.rdd.RDD.collect(RDD.scala:602) In a more complex scenario I also got: Task not serializable: java.io.NotSerializableExcepti

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread psnively
Does this suggest value in an integration of GraphX and neo4j? Sent from my Verizon Wireless Phone - Reply message - From: "Matei Zaharia" To: Cc: Subject: Incrementally add/remove vertices in GraphX Date: Sun, Mar 2, 2014 4:52 pm You can create a ticket, but note that real-time upda

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Matei Zaharia
Good catch, I’ve fixed those. On Mar 2, 2014, at 5:25 PM, Nicholas Chammas wrote: > Quick side-note on that page, Matei: Several versions up to and including > 0.9.0 are still marked as "unreleased" in JIRA. Dunno if that's intentional > (or if it matters any). > > > On Sun, Mar 2, 2014 at 7

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Nicholas Chammas
Quick side-note on that page, Matei: Several versions up to and including 0.9.0 are still marked as "unreleased" in JIRA. Dunno if that's intentional (or if it matters any). On Sun, Mar 2, 2014 at 7:52 PM, Matei Zaharia wrote: > You can create a ticket, but note that real-time updates to the gra

Re: error in streaming word count API?

2014-03-02 Thread Matei Zaharia
Hi Aaron, On Feb 28, 2014, at 8:46 PM, Aaron Kimball wrote: > Hi folks, > > I was trying to work through the streaming word count example at > http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html > and couldn't get the code as-written to run. In fairness, I was tryin

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Matei Zaharia
You can create a ticket, but note that real-time updates to the graph are outside the scope of GraphX right now. It’s meant to be a graph analysis system, not a graph storage system. I’ve added it as a component on https://spark-project.atlassian.net/browse/SPARK. Matei On Mar 2, 2014, at 3:32

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Deepak Nulu
Hi Matei, Thanks for the quick response. Is there a plan to support this? Any ticket I can follow? I don't see a GraphX component at https://spark-project.atlassian.net; is there a different bug database for GraphX? Thanks. -deepak -- View this message in context: http://apache-spark-user-l

Re: Lazyoutput format in spark

2014-03-02 Thread Matei Zaharia
You can probably use LazyOutputFormat directly. If there’s one for the hadoop.mapred API, you can use it with PairRDDFunctions.saveAsHadoopRDD() today, otherwise there’s going to be a version of that for the hadoop.mapreduce API as well in Spark 1.0. Matei On Feb 28, 2014, at 5:18 PM, Mohit Si

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Matei Zaharia
Right now there isn’t. It’s meant for analysis once you have a graph. If you just need a few vertices at the beginning you could add them to the vertex and edge RDDs using RDD.union() before creating a Graph. Matei On Mar 2, 2014, at 2:38 PM, Deepak Nulu wrote: > Hi, > > Is there a way to in

Incrementally add/remove vertices in GraphX

2014-03-02 Thread Deepak Nulu
Hi, Is there a way to incrementally add/remove vertices in GraphX? I have read the documentation and looked at the API, but I don't see a way to incrementally add/remove vertices in GraphX. Thanks. -deepak -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Python 2.7 + numpy break sortByKey()

2014-03-02 Thread Nicholas Chammas
So this issue appears to be related to the other Python 2.7-related issue I reported in this thread . Shall I open a bug in JIRA about this and include the wikistat repro? Nick On

Re: java.net.SocketException on reduceByKey() in pyspark

2014-03-02 Thread Nicholas Chammas
Alright, so this issue is related to the upgrade to Python 2.7, which relates it to the other Python 2.7 issue I reported in this thread . I modified my code not to rely on Python 2.7, spun up a new c

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-02 Thread Aureliano Buendia
Is there a reason for spark using the older akka? On Sun, Mar 2, 2014 at 1:53 PM, 1esha wrote: > The problem is in akka remote. It contains files compiled with 2.4.*. When > you run it with 2.5.* in classpath it fails like above. > > Looks like moving to akka 2.3 will solve this issue. Check th

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Pierre B
I'm still puzzled why trying wget with my IP is not working properly, whereas it's working if I use 127.0.0.1 or localhost... ? -- View this message in context: http://apache-spark-use

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Pierre Borckmans
Hi Nan, Must be a local network config problem with my machine. When I replace the 10.0.1.7 ip with localhost, it works perfectly… Thanks for trying though… Pierre On 02 Mar 2014, at 16:02, Nan Zhu wrote: > Cannot reproduce it….even I add spark-assembly jar > > scala> > sc.addJar("/Users/nan

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Nan Zhu
Cannot reproduce it….even I add spark-assembly jar scala> sc.addJar("/Users/nanzhu/code/spark-0.9.0-incubating/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar") 14/03/02 09:59:47 INFO SparkContext: Added JAR /Users/nanzhu/code/spark-0.9.0-incubating/assembly/target/

Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Pierre B
Hi all! In spark 0.9.0, local mode, whenever I try to add jar(s), using either SparkConf.addJars or SparkConfiguration.addJar, in the shell or in a standalone mode, I observe a strange behaviour. I investigated this because my standalone app works perfectly on my cluster but is getting stuck in l

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-02 Thread 1esha
The problem is in akka remote. It contains files compiled with 2.4.*. When you run it with 2.5.* in classpath it fails like above. Looks like moving to akka 2.3 will solve this issue. Check this issue - https://www.assembla.com/spaces/akka/tickets/3154-use-protobuf-version-2-5-0#/activity/ticket:

Re: Unable to load realm info from SCDynamicStore

2014-03-02 Thread Sean Owen
This is completely normal for Hadoop. Unless you specially install some libraries like snappy you will get this, but it does not hurt. -- Sean Owen | Director, Data Science | London On Sun, Mar 2, 2014 at 8:40 AM, xiiik wrote: > hi all, > > i have build spark-0.9.0-incubating-bin-hadoop2.tgz on

Unable to load realm info from SCDynamicStore

2014-03-02 Thread xiiik
hi all, i have build spark-0.9.0-incubating-bin-hadoop2.tgz on my MacBook, and the pyspark works well, but got the message below. (i don’t have Hadoop installed on my MacBook) …... 14/03/02 15:31:59 INFO HttpServer: Starting HTTP Server 14/03/02 15:31:59 INFO SparkUI: Started Spark Web UI at