Re: how to create a Graph in GraphX?

2014-11-11 Thread ankurdave
You should be able to construct the edges in a single map() call without using collect(): val edges: RDD[Edge[String]] = sc.textFile(...).map { line => val row = line.split(",") Edge(row(0), row(1), row(2) } val graph: Graph[Int, String] = Graph.fromEdges(edges, defaultValue = 1) -- View th

Re: counting degrees graphx

2014-05-25 Thread ankurdave
Sorry, I missed vertex 6 in that example. It should be [{1}, {1}, {1}, {1}, {1, 6}, {6}, {7}, {7}]. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/counting-degrees-graphx-tp6370p6378.html Sent from the Apache Spark User List mailing list archive at Nabble.c

Re: Benchmarking Graphx

2014-05-19 Thread ankurdave
On May 17, 2014 at 2:59pm, Hari wrote: > a) Is there a way to get the total time taken for the execution from start to finish? Assuming you're running the benchmark as a standalone program, such as by invoking the Analytics driver

Re: Variables outside of mapPartitions scope

2014-05-13 Thread ankurdave
In general, you can find out exactly what's not serializable by adding -Dsun.io.serialization.extendedDebugInfo=true to SPARK_JAVA_OPTS. Since a this reference to the enclosing class is often what's causing the problem, a general workaround is to move the mapPartitions call to a static method where

Re: Caching in graphX

2014-05-13 Thread ankurdave
Unfortunately it's very difficult to get uncaching right with GraphX due to the complicated internal dependency structure that it creates. It's necessary to know exactly what operations you're doing on the graph in order to unpersist correctly (i.e., in a way that avoids recomputation). I have a p

Re: Is there any problem on the spark mailing list?

2014-05-11 Thread ankurdave
I haven't been getting mail either. This was the last message I received: http://apache-spark-user-list.1001560.n3.nabble.com/master-attempted-to-re-register-the-worker-and-then-took-all-workers-as-unregistered-tp553p5491.html -- View this message in context: http://apache-spark-user-list.10015

Re: sample data for pagerank?

2014-03-18 Thread ankurdave
The examples in graphx/data are meant to show the input data format, but if you want to play around with larger and more interesting datasets, we've been using the following ones, among others: - SNAP's web-Google dataset (5M edges): https://snap.stanford.edu/data/web-Google.html - SNAP's soc-Live

Re: Are there any plans to develop Graphx Streaming?

2014-03-18 Thread ankurdave
Yes, Joey Gonzalez and I are working on a streaming version of GraphX. It's not usable yet, but we will announce when an alpha is ready, likely in a few months. Ankur -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Are-there-any-plans-to-develop-Graphx-Stre

Re: There is an error in Graphx

2014-03-18 Thread ankurdave
> The workaround is to force a copy using graph.triplets.map(_.copy()). Sorry, this actually won't copy the entire triplet, only the attributes defined in Edge. The right workaround is to copy the EdgeTriplet explicitly: graph.triplets.map { et => val et2 = new EdgeTriplet[VD, ED] // Replace

Re: There is an error in Graphx

2014-03-18 Thread ankurdave
This problem occurs because graph.triplets generates an iterator that reuses the same EdgeTriplet object for every triplet in the partition. The workaround is to force a copy using graph.triplets.map(_.copy()). The solution in the AMPCamp tutorial is mistaken -- I'm not sure if that ever worked.