At 2014-08-25 11:23:37 -0700, Sunita Arvind <sunitarv...@gmail.com> wrote: > Does this "We introduce GraphX, which combines the advantages of both > data-parallel and graph-parallel systems by efficiently expressing graph > computation within the Spark data-parallel framework. We leverage new ideas > in distributed graph representation to efficiently distribute graphs as > tabular data-structures. Similarly, we leverage advances in data-flow > systems to exploit in-memory computation and fault-tolerance." mean that > GraphX makes the typical RDBMS operations possible even when the data is > persisted in a GDBMS and not viceversa?
This quote refers to the research idea that while previous graph-parallel systems (Pregel, GraphLab, etc.) were built as specialized systems for performance, it's actually possible to avoid the trouble of a separate system by embedding graph computation efficiently in a general data-parallel system. Here "data-parallel" refers generally to any system that can support the join optimizations, including Spark and, with some work on the optimizer, relational databases as well. So GraphX use data-parallel or relational operators to provide graph computation, not the other way around. > From what I initially thought, it looked like GraphX could be applied to data > stored in RDBMSs as Spark could translate the relational data into graphical > representation. However, there seems to be no conversation and everything > presented in GraphX implementations AFAIK, works on vertices and edges. So > does it mean that GraphX is only relevant when the backend is a GDBMS? GraphX, the library on top of Spark, can be applied indirectly to relational data as you described: you can use Spark to load vertex and edge tables from a relational database, then process them with GraphX. This isn't discussed in the GraphX documentation because it's a concern of Spark. GraphX is only relevant once you have the vertices and edges in RDD form. GraphX, the research concept, can in theory be implemented directly in a relational database by augmenting the query optimizer to support the optimizations described in the paper and setting up the appropriate indexes on the vertex and edge tables. Ankur --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org