Re: Apache Spark and Graphx for Real Time Analytics

2014-04-08 Thread Reynold Xin
Nick and Koert summarized it pretty well. Just to clarify and give some concrete examples. If you want to start with a specific vertex, and follow some path, it is probably easier and faster to use some key values store or even MySQL or a graph database. If you want to count the average length of

Re: Apache Spark and Graphx for Real Time Analytics

2014-04-08 Thread Nick Pentreath
Likely neither will give real-time for full-graph traversal, no. And once in memory, GraphX would definitely be faster for "breadth-first" traversal. But for "vertex-centric" traversals (starting from a vertex and traversing edges from there, such as "friends of friends" queries etc) then Titan is

Re: Apache Spark and Graphx for Real Time Analytics

2014-04-08 Thread Koert Kuipers
it all depends on what kind of traversing. if its point traversing then a random access based something would be great. if its more scan-like traversl then spark will fit On Tue, Apr 8, 2014 at 4:56 PM, Evan Chan wrote: > I doubt Titan would be able to give you traversal of billions of nodes i

Re: Apache Spark and Graphx for Real Time Analytics

2014-04-08 Thread Evan Chan
I doubt Titan would be able to give you traversal of billions of nodes in real-time either. In-memory traversal is typically much faster than Cassandra-based tree traversal, even including in-memory caching. On Tue, Apr 8, 2014 at 1:23 PM, Nick Pentreath wrote: > GraphX, like Spark, will not t

Re: Apache Spark and Graphx for Real Time Analytics

2014-04-08 Thread Nick Pentreath
GraphX, like Spark, will not typically be "real-time" (where by "real-time" here I assume you mean of the order of a few 10s-100s ms, up to a few seconds). Spark can in some cases approach the upper boundary of this definition (a second or two, possibly less) when data is cached in memory and the