At 2014-08-25 11:23:37 -0700, Sunita Arvind <sunitarv...@gmail.com> wrote:
> Does this "We introduce GraphX, which combines the advantages of both
> data-parallel and graph-parallel systems by efficiently expressing graph
> computation within the Spark data-parallel framework. We leverage new ideas
> in distributed graph representation to efficiently distribute graphs as
> tabular data-structures. Similarly, we leverage advances in data-flow
> systems to exploit in-memory computation and fault-tolerance." mean that
> GraphX makes the typical RDBMS operations possible even when the data is
> persisted in a GDBMS and not viceversa?

This quote refers to the research idea that while previous graph-parallel 
systems (Pregel, GraphLab, etc.) were built as specialized systems for 
performance, it's actually possible to avoid the trouble of a separate system 
by embedding graph computation efficiently in a general data-parallel system. 
Here "data-parallel" refers generally to any system that can support the join 
optimizations, including Spark and, with some work on the optimizer, relational 
databases as well. So GraphX use data-parallel or relational operators to 
provide graph computation, not the other way around.

> From what I initially thought, it looked like GraphX could be applied to data
> stored in RDBMSs as Spark could translate the relational data into graphical
> representation. However, there seems to be no conversation and everything
> presented in GraphX implementations AFAIK, works on vertices and edges. So
> does it mean that GraphX is only relevant when the backend is a GDBMS?

GraphX, the library on top of Spark, can be applied indirectly to relational 
data as you described: you can use Spark to load vertex and edge tables from a 
relational database, then process them with GraphX. This isn't discussed in the 
GraphX documentation because it's a concern of Spark. GraphX is only relevant 
once you have the vertices and edges in RDD form.

GraphX, the research concept, can in theory be implemented directly in a 
relational database by augmenting the query optimizer to support the 
optimizations described in the paper and setting up the appropriate indexes on 
the vertex and edge tables.

Ankur

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to