For real time graph mutations and queries, you might consider a graph database like Neo4j or TitanDB. Titan can be backed by HBase, which you're already using, so that's probably worth a look.
On Thu, Feb 25, 2016, 9:55 AM Udbhav Agarwal <udbhav.agar...@syncoms.com> wrote: > That’s a good thing you pointed out. Let me check that. Thanks. > > > > Another thing I was struggling with is while this process of addition of > vertices is happening with the graph(name is *inputGraph)* am not able to > access it or perform query over it. Currently when I am querying the graph > during the addition of vertices, its giving result after the addition is > over. I have also tried with creating and querying another variable > tempInputGraph where am storing state of inputGraph, which is updated > whenever the addition process is over. But querying this is also being > delayed due to the background process. > > I have set the number of executors as 8 as per my 8 core system. > > Please provide any suggestion as to how I can keep this graph always > available to user even if any background process is happening over it. Let > me know if it is possible or not as you said graphx is not really designed > for real time needs. > > > > If not graphX which other tool I can consider if I have real time needs. > To elaborate I want to have a real time system which can store data as and > when it is coming and I can query over it in real time. > > In present case I am using graphx. My data is entering my system via kafka > and spark streaming and then its updating a graph of let’s say orders. One > copy of this is sent to hbase where the data is persisted for later use. > Now I want to query this graph for getting various insights in this orders > data. I was using graphx because it’s really helpful to use graphs if we > want to analyse related/connected information e.g. friends of friends and > other stuffs. > > > > I really appreciate your valuable help Robin. Thank you In advance. > > > > Udbhav. > > *From:* Robin East [mailto:robin.e...@xense.co.uk] > *Sent:* Thursday, February 25, 2016 7:42 PM > > > *To:* Udbhav Agarwal <udbhav.agar...@syncoms.com> > *Cc:* user@spark.apache.org > *Subject:* Re: Reindexing in graphx > > > > So first up GraphX is not really designed for real-time graph mutation > time situations. That’s not to say it can’t be done but you may be butting > up against some of the design limitations in that area. As a first point of > interrogation you should look at the WebUI to see what particular > tasks/stages are taking a long time, and what resource (CPU, IO, network, > shuffles) do they seem to be bottle-necking on. > > > ------------------------------------------------------------------------------- > > Robin East > > *Spark GraphX in Action *Michael Malak and Robin East > > Manning Publications Co. > > http://www.manning.com/books/spark-graphx-in-action > > > > > > > > > > On 24 Feb 2016, at 12:05, Udbhav Agarwal <udbhav.agar...@syncoms.com> > wrote: > > > > Sounds useful Robin. Thanks. I will try that. But fyi in another case I > tested with adding only one vertex to the graph. In that case also the > latency for subsequent addition was increasing like for first addition of a > vertex its 3 seconds, then for second its 7 seconds and so on. This is a > case when I want to add vertices to graph as and when they are coming in > our system since it’s a real time system which I am trying to build so > vertices will be keep on coming. > > > > Thanks. > > *From:* Robin East [mailto:robin.e...@xense.co.uk <robin.e...@xense.co.uk> > ] > *Sent:* Wednesday, February 24, 2016 3:54 PM > *To:* Udbhav Agarwal <udbhav.agar...@syncoms.com> > *Cc:* user@spark.apache.org > *Subject:* Re: Reindexing in graphx > > > > It looks like you adding vertices one-by-one, you definitely don’t want to > do that. What happens when you batch together 400 vertices into an RDD and > then add 400 in one go? > > > ------------------------------------------------------------------------------- > > Robin East > > *Spark GraphX in Action *Michael Malak and Robin East > > Manning Publications Co. > > http://www.manning.com/books/spark-graphx-in-action > > > > > > > > > > On 24 Feb 2016, at 05:49, Udbhav Agarwal <udbhav.agar...@syncoms.com> > wrote: > > > > Thank you Robin for your reply. > > Actually I am adding bunch of vertices in a graph in graphx using the > following method . I am facing the problem of latency. First time an > addition of say 400 vertices to a graph with 100,000 nodes takes around 7 > seconds. next time its taking 15 seconds. So every subsequent adds are > taking more time than the previous one. Hence I tried to do reindex() so > the subsequent operations can also be performed fast. > > FYI My cluster is presently having one machine with 8 core and 8 gb ram. I > am running in local mode. > > > > def addVertex(rdd: RDD[String], sc: SparkContext, session: String): Long = > { > val defaultUser = (0, 0) > rdd.collect().foreach { x => > { > val aVertex: RDD[(VertexId, (Int, Int))] = > sc.parallelize(Array((x.toLong, (100, 100)))) > gVertices = gVertices.union(aVertex) > } > } > inputGraph = Graph(gVertices, gEdges, defaultUser) > inputGraph.cache() > gVertices = inputGraph.vertices > gVertices.cache() > val count = gVertices.count > println(count); > > return 1; > } > > > > > > *From:* Robin East [mailto:robin.e...@xense.co.uk <robin.e...@xense.co.uk> > ] > *Sent:* Tuesday, February 23, 2016 8:15 PM > *To:* Udbhav Agarwal <udbhav.agar...@syncoms.com> > *Subject:* Re: Reindexing in graphx > > > > Hi > > > > Well this is the line that is failing in VertexRDDImpl: > > > > require(partitionsRDD.partitioner.isDefined) > > > > But really you shouldn’t need to be calling the reindex() function as it > deals with some internals of the GraphX implementation - it looks to me > like it ought to be a private method. Perhaps you could explain what you > are trying to achieve. > > > ------------------------------------------------------------------------------- > > Robin East > > *Spark GraphX in Action *Michael Malak and Robin East > > Manning Publications Co. > > http://www.manning.com/books/spark-graphx-in-action > > > > > > > > > > On 23 Feb 2016, at 12:18, Udbhav Agarwal <udbhav.agar...@syncoms.com> > wrote: > > > > Hi, > > I am trying to add vertices to a graph in graphx and I want to do > reindexing in the graph. I can see there is an option of vertices.reindex() > in graphX. But when I am doing graph.vertices.reindex() am getting > > Java.lang.IllegalArgumentException: requirement failed. > > Please help me know what I am missing with the syntax as I have seen the > API documentation where only vertices.reindex() is mentioned. > > > > *Thanks,* > > *Udbhav Agarwal* > > >