Re: Reindexing in graphx

Karl Higley Thu, 25 Feb 2016 07:24:49 -0800

For real time graph mutations and queries, you might consider a graph
database like Neo4j or TitanDB. Titan can be backed by HBase, which you're
already using, so that's probably worth a look.


On Thu, Feb 25, 2016, 9:55 AM Udbhav Agarwal <udbhav.agar...@syncoms.com>
wrote:

> That’s a good thing you pointed out. Let me check that. Thanks.
>
>
>
> Another thing I was struggling with is while this process of addition of
> vertices is happening with the graph(name is *inputGraph)* am not able to
> access it or perform query over it. Currently when I am querying the graph
> during the addition of vertices, its giving result after the addition is
> over. I have also tried with creating and querying another variable
> tempInputGraph where am storing state of inputGraph, which is updated
> whenever the addition process is over. But querying this is also being
> delayed due to the background process.
>
> I have set the number of executors as 8 as per my 8 core system.
>
> Please provide any suggestion as to how I can keep this graph always
> available to user even if any background process is happening over it. Let
> me know if it is possible or not as you said graphx is not really designed
> for real time needs.
>
>
>
> If not graphX which other tool I can consider if I have real time needs.
> To elaborate I want to have a real time system which can store data as and
> when it is coming and I can query over it in real time.
>
> In present case I am using graphx. My data is entering my system via kafka
> and spark streaming and then its updating a graph of let’s say orders. One
> copy of this is sent to hbase where the data is persisted for later use.
> Now I want to query this graph for getting various insights in this orders
> data. I was using graphx because it’s really helpful to use graphs if we
> want to analyse related/connected information e.g. friends of friends and
> other stuffs.
>
>
>
> I really appreciate your valuable help Robin. Thank you In advance.
>
>
>
> Udbhav.
>
> *From:* Robin East [mailto:robin.e...@xense.co.uk]
> *Sent:* Thursday, February 25, 2016 7:42 PM
>
>
> *To:* Udbhav Agarwal <udbhav.agar...@syncoms.com>
> *Cc:* user@spark.apache.org
> *Subject:* Re: Reindexing in graphx
>
>
>
> So first up GraphX is not really designed for real-time graph mutation
> time situations. That’s not to say it can’t be done but you may be butting
> up against some of the design limitations in that area. As a first point of
> interrogation you should look at the WebUI to see what particular
> tasks/stages are taking a long time, and what resource (CPU, IO, network,
> shuffles) do they seem to be bottle-necking on.
>
>
> -------------------------------------------------------------------------------
>
> Robin East
>
> *Spark GraphX in Action *Michael Malak and Robin East
>
> Manning Publications Co.
>
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
>
>
>
>
> On 24 Feb 2016, at 12:05, Udbhav Agarwal <udbhav.agar...@syncoms.com>
> wrote:
>
>
>
> Sounds useful Robin. Thanks. I will try that. But fyi in another case I
> tested with adding only one vertex to the graph. In that case also the
> latency for subsequent addition was increasing like for first addition of a
> vertex its 3 seconds, then for second its 7 seconds and so on. This is a
> case when I want to add vertices to graph as and when they are coming in
> our system since it’s a real time system which I am trying to build so
> vertices will be keep on coming.
>
>
>
> Thanks.
>
> *From:* Robin East [mailto:robin.e...@xense.co.uk <robin.e...@xense.co.uk>
> ]
> *Sent:* Wednesday, February 24, 2016 3:54 PM
> *To:* Udbhav Agarwal <udbhav.agar...@syncoms.com>
> *Cc:* user@spark.apache.org
> *Subject:* Re: Reindexing in graphx
>
>
>
> It looks like you adding vertices one-by-one, you definitely don’t want to
> do that. What happens when you batch together 400 vertices into an RDD and
> then add 400 in one go?
>
>
> -------------------------------------------------------------------------------
>
> Robin East
>
> *Spark GraphX in Action *Michael Malak and Robin East
>
> Manning Publications Co.
>
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
>
>
>
>
> On 24 Feb 2016, at 05:49, Udbhav Agarwal <udbhav.agar...@syncoms.com>
> wrote:
>
>
>
> Thank you Robin for your reply.
>
> Actually I am adding bunch of vertices in a graph in graphx using the
> following method . I am facing the problem of latency. First time an
> addition of say 400 vertices to a graph with 100,000 nodes takes around 7
> seconds. next time its taking 15 seconds. So every subsequent adds are
> taking more time than the previous one. Hence I tried to do reindex() so
> the subsequent operations can also be performed fast.
>
> FYI My cluster is presently having one machine with 8 core and 8 gb ram. I
> am running in local mode.
>
>
>
> def addVertex(rdd: RDD[String], sc: SparkContext, session: String): Long =
> {
>     val defaultUser = (0, 0)
>     rdd.collect().foreach { x =>
>       {
>         val aVertex: RDD[(VertexId, (Int, Int))] =
> sc.parallelize(Array((x.toLong, (100, 100))))
>         gVertices = gVertices.union(aVertex)
>       }
>     }
>     inputGraph = Graph(gVertices, gEdges, defaultUser)
>     inputGraph.cache()
>     gVertices = inputGraph.vertices
>     gVertices.cache()
>     val count = gVertices.count
>     println(count);
>
>     return 1;
>   }
>
>
>
>
>
> *From:* Robin East [mailto:robin.e...@xense.co.uk <robin.e...@xense.co.uk>
> ]
> *Sent:* Tuesday, February 23, 2016 8:15 PM
> *To:* Udbhav Agarwal <udbhav.agar...@syncoms.com>
> *Subject:* Re: Reindexing in graphx
>
>
>
> Hi
>
>
>
> Well this is the line that is failing in VertexRDDImpl:
>
>
>
> require(partitionsRDD.partitioner.isDefined)
>
>
>
> But really you shouldn’t need to be calling the reindex() function as it
> deals with some internals of the GraphX implementation - it looks to me
> like it ought to be a private method. Perhaps you could explain what you
> are trying to achieve.
>
>
> -------------------------------------------------------------------------------
>
> Robin East
>
> *Spark GraphX in Action *Michael Malak and Robin East
>
> Manning Publications Co.
>
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
>
>
>
>
> On 23 Feb 2016, at 12:18, Udbhav Agarwal <udbhav.agar...@syncoms.com>
> wrote:
>
>
>
> Hi,
>
> I am trying to add vertices to a graph in graphx and I want to do
> reindexing in the graph. I can see there is an option of vertices.reindex()
> in graphX. But when I am doing graph.vertices.reindex() am getting
>
> Java.lang.IllegalArgumentException: requirement failed.
>
> Please help me know what I am missing with the syntax as I have seen the
> API documentation where only vertices.reindex() is mentioned.
>
>
>
> *Thanks,*
>
> *Udbhav Agarwal*
>
>
>

Re: Reindexing in graphx

Reply via email to