Hi, Vertices are simply hash-paritioned by their 64-bit IDs, so they are evenly spread over parititons.
As for edges, GraphLoader#edgeList builds edge paritions through hadoopFile(), so the initial parititons depend on InputFormat#getSplits implementations (e.g, partitions are mostly equal to 64MB blocks for HDFS). Edges can be re-partitioned by ParititonStrategy; a graph is partitioned considering graph structures and a source ID and a destination ID are used as partition keys. The partitions might suffer from skewness depending on graph properties (hub nodes, or something). Thanks, takeshi On Tue, Mar 10, 2015 at 2:21 AM, Matthew Bucci <[email protected]> wrote: > Hello, > > I am working on a project where we want to split graphs of data into > snapshots across partitions and I was wondering what would happen if one of > the snapshots we had was too large to fit into a single partition. Would > the > snapshot be split over the two partitions equally, for example, and how is > a > single snapshot spread over multiple partitions? > > Thank You, > Matthew Bucci > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Snapshot-Partitioning-tp21977.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- --- Takeshi Yamamuro
