Re: GraphX Snapshot Partitioning

Takeshi Yamamuro Mon, 09 Mar 2015 20:09:27 -0700

Hi,

Vertices are simply hash-paritioned by their 64-bit IDs, so
they are evenly spread over parititons.

As for edges, GraphLoader#edgeList builds edge paritions
through hadoopFile(), so the initial parititons depend
on InputFormat#getSplits implementations
(e.g, partitions are mostly equal to 64MB blocks for HDFS).

Edges can be re-partitioned by ParititonStrategy;
a graph is partitioned considering graph structures and
a source ID and a destination ID are used as partition keys.
The partitions might suffer from skewness depending
on graph properties (hub nodes, or something).

Thanks,
takeshi

On Tue, Mar 10, 2015 at 2:21 AM, Matthew Bucci <[email protected]> wrote:

> Hello,
>
> I am working on a project where we want to split graphs of data into
> snapshots across partitions and I was wondering what would happen if one of
> the snapshots we had was too large to fit into a single partition. Would
> the
> snapshot be split over the two partitions equally, for example, and how is
> a
> single snapshot spread over multiple partitions?
>
> Thank You,
> Matthew Bucci
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/GraphX-Snapshot-Partitioning-tp21977.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 
---
Takeshi Yamamuro

Re: GraphX Snapshot Partitioning

Reply via email to