Re: RDD Blocks skewing to just few executors

2015-03-20 Thread Alessandro Lulli
Hi All, I'm experiencing the same issue with Spark 120 (not verified with previous). Could you please help us on this? Thanks Alessandro On Tue, Nov 18, 2014 at 1:40 AM, mtimper wrote: > Hi I'm running a standalone cluster with 8 worker servers. > I'm developing a streaming app that is adding

Re: RDD Partition number

2015-02-20 Thread Alessandro Lulli
What file system are you using ? >> >> If you use hdfs, the documentation you cited is pretty clear on how >> partitions are determined. >> >> bq. file X replicated on 4 machines >> >> I don't think replication factor plays a role w.r.t. par

RDD Partition number

2015-02-19 Thread Alessandro Lulli
Hi All, Could you please help me understanding how Spark defines the number of partitions of the RDDs if not specified? I found the following in the documentation for file loaded from HDFS: *The textFile method also takes an optional second argument for controlling the number of partitions of the

Re: Job aborted due to stage failure: TID x failed for unknown reasons

2014-07-22 Thread Alessandro Lulli
Hi All, Can someone help on this? I'm encountering exactly the same issue in a very similar scenario with the same spark version. Thanks Alessandro On Fri, Jul 18, 2014 at 8:30 PM, Shannon Quinn wrote: > Hi all, > > I'm dealing with some strange error messages that I *think* comes down to >

Re: Incrementally add/remove vertices in GraphX

2014-03-19 Thread Alessandro Lulli
Hi All, Thanks for your answer. Regarding GraphX streaming: - Is there an issue (pull request) to follow to keep track of the update? - where is possible to find description and details of what will be provided? Thanks for your help and your time to answer my questions Alessandro O

Re: Incrementally add/remove vertices in GraphX

2014-03-17 Thread Alessandro Lulli
Hi All, Is somebody looking into this? I think this is correlated with the discussion "Are there any plans to develop Graphx Streaming?". Using union / subtract on VertexRDD or EdgeRDD leads on the creation of new RDD but NOT in the modification of the RDD in the graph. Is creating a new graph th

Computation time increasing every super-step

2014-03-11 Thread Alessandro Lulli
Hi All, I'm facing a performance degradation running an iterative algorithm built using Spark 0.9 and GraphX. I'm using org.apache.spark.graphx.Pregel to run the iterative algorithm. My graph has 2395 vertex 7462 edges. Every super step the computation time increase significantly. The steps 1-5