date:20140302

Re: error in streaming word count API?

2014-03-02 Thread Aaron Kimball

Filed SPARK-1173 and sent a pull request. As an aside, I think this should probably have been in the STREAMING project on the JIRA, but JIRA seemed adamant that it only allow me to create new issues in the SPARK project. Not sure if that's a JIRA permissions thing, or me losing a fight with Atlassi

Re: error in streaming word count API?

2014-03-02 Thread Aaron Kimball

Running `nc -lk 1234` in one terminal, and running `nc localhost 1234` in another, it demonstrates line-buffered behavior. It's a mystery! Thanks for the link on implicit conversions. The example makes sense. Makes the code easier to trace too. I'll send a JIRA + pull req to touch up the docs.

Re: Job initialization performance of Spark standalone mode vs YARN

2014-03-02 Thread polkosity

Thanks for the advice Mayur. I thought I'd report back on the performance difference... Spark standalone mode has executors processing at capacity in under a second :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Job-initialization-performance-of-Spark-

Re: Help with groupByKey

2014-03-02 Thread Cheng Lian

Actually it should be rdd.reduceByKey(_ ++ _) On Mar 3, 2014, at 11:56, Andrew Ash wrote: > rdd.reduceByKey(_+_) using list concatenation? > > Sent from my mobile phone > > On Mar 2, 2014 7:05 PM, "David Thomas" wrote: > I have an RDD of (K, Array[V]) pairs. > > For example: ((key1, (1,2,3))

Re: Help with groupByKey

2014-03-02 Thread Cheng Lian

Actually it should be rdd.reduceByKey(_ ++ _) On Mar 3, 2014, at 11:56, Andrew Ash wrote: > rdd.reduceByKey(_+_) using list concatenation? > > Sent from my mobile phone > > On Mar 2, 2014 7:05 PM, "David Thomas" wrote: > I have an RDD of (K, Array[V]) pairs. > > For example: ((key1, (1,2,3))

Re: Help with groupByKey

2014-03-02 Thread Andrew Ash

rdd.reduceByKey(_+_) using list concatenation? Sent from my mobile phone On Mar 2, 2014 7:05 PM, "David Thomas" wrote: > I have an RDD of (K, Array[V]) pairs. > > For example: ((key1, (1,2,3)), (key2, (3,2,4)), (key1, (4,3,2))) > > How can I do a groupByKey such that I get back an RDD of the for

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Aditya Varun Chadha

Or Titan on hbase, so you could try reading graphs directly via custom io formats

Help with groupByKey

2014-03-02 Thread David Thomas

I have an RDD of (K, Array[V]) pairs. For example: ((key1, (1,2,3)), (key2, (3,2,4)), (key1, (4,3,2))) How can I do a groupByKey such that I get back an RDD of the form (K, Array[V]) pairs. Ex: ((key1, (1,2,3,4,3,2)), (key2, (3,2,4)))

Re: flatten RDD[RDD[T]]

2014-03-02 Thread Josh Rosen

Nope, nested RDDs aren't supported: https://groups.google.com/d/msg/spark-users/_Efj40upvx4/DbHCixW7W7kJ https://groups.google.com/d/msg/spark-users/KC1UJEmUeg8/N_qkTJ3nnxMJ https://groups.google.com/d/msg/spark-users/rkVPXAiCiBk/CORV5jyeZpAJ On Sun, Mar 2, 2014 at 5:37 PM, Cosmin Radoi wrote:

flatten RDD[RDD[T]]

2014-03-02 Thread Cosmin Radoi

I'm trying to flatten an RDD of RDDs. The straightforward approach: a: [RDD[RDD[Int]] a flatMap { _.collect } throws a java.lang.NullPointerException at org.apache.spark.rdd.RDD.collect(RDD.scala:602) In a more complex scenario I also got: Task not serializable: java.io.NotSerializableExcepti

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread psnively

Does this suggest value in an integration of GraphX and neo4j? Sent from my Verizon Wireless Phone - Reply message - From: "Matei Zaharia" To: Cc: Subject: Incrementally add/remove vertices in GraphX Date: Sun, Mar 2, 2014 4:52 pm You can create a ticket, but note that real-time upda

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Matei Zaharia

Good catch, I’ve fixed those. On Mar 2, 2014, at 5:25 PM, Nicholas Chammas wrote: > Quick side-note on that page, Matei: Several versions up to and including > 0.9.0 are still marked as "unreleased" in JIRA. Dunno if that's intentional > (or if it matters any). > > > On Sun, Mar 2, 2014 at 7

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Nicholas Chammas

Quick side-note on that page, Matei: Several versions up to and including 0.9.0 are still marked as "unreleased" in JIRA. Dunno if that's intentional (or if it matters any). On Sun, Mar 2, 2014 at 7:52 PM, Matei Zaharia wrote: > You can create a ticket, but note that real-time updates to the gra

Re: error in streaming word count API?

2014-03-02 Thread Matei Zaharia

Hi Aaron, On Feb 28, 2014, at 8:46 PM, Aaron Kimball wrote: > Hi folks, > > I was trying to work through the streaming word count example at > http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html > and couldn't get the code as-written to run. In fairness, I was tryin

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Matei Zaharia

You can create a ticket, but note that real-time updates to the graph are outside the scope of GraphX right now. It’s meant to be a graph analysis system, not a graph storage system. I’ve added it as a component on https://spark-project.atlassian.net/browse/SPARK. Matei On Mar 2, 2014, at 3:32

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Deepak Nulu

Hi Matei, Thanks for the quick response. Is there a plan to support this? Any ticket I can follow? I don't see a GraphX component at https://spark-project.atlassian.net; is there a different bug database for GraphX? Thanks. -deepak -- View this message in context: http://apache-spark-user-l

Re: Lazyoutput format in spark

2014-03-02 Thread Matei Zaharia

You can probably use LazyOutputFormat directly. If there’s one for the hadoop.mapred API, you can use it with PairRDDFunctions.saveAsHadoopRDD() today, otherwise there’s going to be a version of that for the hadoop.mapreduce API as well in Spark 1.0. Matei On Feb 28, 2014, at 5:18 PM, Mohit Si

Re: Incrementally add/remove vertices in GraphX

2014-03-02 Thread Matei Zaharia

Right now there isn’t. It’s meant for analysis once you have a graph. If you just need a few vertices at the beginning you could add them to the vertex and edge RDDs using RDD.union() before creating a Graph. Matei On Mar 2, 2014, at 2:38 PM, Deepak Nulu wrote: > Hi, > > Is there a way to in

Incrementally add/remove vertices in GraphX

2014-03-02 Thread Deepak Nulu

Hi, Is there a way to incrementally add/remove vertices in GraphX? I have read the documentation and looked at the API, but I don't see a way to incrementally add/remove vertices in GraphX. Thanks. -deepak -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/

Re: Python 2.7 + numpy break sortByKey()

2014-03-02 Thread Nicholas Chammas

So this issue appears to be related to the other Python 2.7-related issue I reported in this thread . Shall I open a bug in JIRA about this and include the wikistat repro? Nick On

Re: java.net.SocketException on reduceByKey() in pyspark

2014-03-02 Thread Nicholas Chammas

Alright, so this issue is related to the upgrade to Python 2.7, which relates it to the other Python 2.7 issue I reported in this thread . I modified my code not to rely on Python 2.7, spun up a new c

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-02 Thread Aureliano Buendia

Is there a reason for spark using the older akka? On Sun, Mar 2, 2014 at 1:53 PM, 1esha wrote: > The problem is in akka remote. It contains files compiled with 2.4.*. When > you run it with 2.5.* in classpath it fails like above. > > Looks like moving to akka 2.3 will solve this issue. Check th

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Pierre B

I'm still puzzled why trying wget with my IP is not working properly, whereas it's working if I use 127.0.0.1 or localhost... ? -- View this message in context: http://apache-spark-use

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Pierre Borckmans

Hi Nan, Must be a local network config problem with my machine. When I replace the 10.0.1.7 ip with localhost, it works perfectly… Thanks for trying though… Pierre On 02 Mar 2014, at 16:02, Nan Zhu wrote: > Cannot reproduce it….even I add spark-assembly jar > > scala> > sc.addJar("/Users/nan

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Nan Zhu

Cannot reproduce it….even I add spark-assembly jar scala> sc.addJar("/Users/nanzhu/code/spark-0.9.0-incubating/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.0.4.jar") 14/03/02 09:59:47 INFO SparkContext: Added JAR /Users/nanzhu/code/spark-0.9.0-incubating/assembly/target/

Spark 0.9.0 - local mode - sc.addJar problem (bug?)

2014-03-02 Thread Pierre B

Hi all! In spark 0.9.0, local mode, whenever I try to add jar(s), using either SparkConf.addJars or SparkConfiguration.addJar, in the shell or in a standalone mode, I observe a strange behaviour. I investigated this because my standalone app works perfectly on my cluster but is getting stuck in l

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-02 Thread 1esha

The problem is in akka remote. It contains files compiled with 2.4.*. When you run it with 2.5.* in classpath it fails like above. Looks like moving to akka 2.3 will solve this issue. Check this issue - https://www.assembla.com/spaces/akka/tickets/3154-use-protobuf-version-2-5-0#/activity/ticket:

Re: Unable to load realm info from SCDynamicStore

2014-03-02 Thread Sean Owen

This is completely normal for Hadoop. Unless you specially install some libraries like snappy you will get this, but it does not hurt. -- Sean Owen | Director, Data Science | London On Sun, Mar 2, 2014 at 8:40 AM, xiiik wrote: > hi all, > > i have build spark-0.9.0-incubating-bin-hadoop2.tgz on

Unable to load realm info from SCDynamicStore

2014-03-02 Thread xiiik

hi all, i have build spark-0.9.0-incubating-bin-hadoop2.tgz on my MacBook, and the pyspark works well, but got the message below. (i don’t have Hadoop installed on my MacBook) …... 14/03/02 15:31:59 INFO HttpServer: Starting HTTP Server 14/03/02 15:31:59 INFO SparkUI: Started Spark Web UI at

Re: error in streaming word count API?

Re: error in streaming word count API?

Re: Job initialization performance of Spark standalone mode vs YARN

Re: Help with groupByKey

Re: Help with groupByKey

Re: Help with groupByKey

Re: Incrementally add/remove vertices in GraphX

Help with groupByKey

Re: flatten RDD[RDD[T]]

flatten RDD[RDD[T]]

Re: Incrementally add/remove vertices in GraphX

Re: Incrementally add/remove vertices in GraphX

Re: Incrementally add/remove vertices in GraphX

Re: error in streaming word count API?

Re: Incrementally add/remove vertices in GraphX

Re: Incrementally add/remove vertices in GraphX

Re: Lazyoutput format in spark

Re: Incrementally add/remove vertices in GraphX

Incrementally add/remove vertices in GraphX

Re: Python 2.7 + numpy break sortByKey()

Re: java.net.SocketException on reduceByKey() in pyspark

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

Re: Spark 0.9.0 - local mode - sc.addJar problem (bug?)

Spark 0.9.0 - local mode - sc.addJar problem (bug?)

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

Re: Unable to load realm info from SCDynamicStore

Unable to load realm info from SCDynamicStore

29 matches

Site Navigation

Mail list logo

Footer information