I'd think id is the unique identifier by default.
On Wed, Jun 3, 2015 at 12:13 AM, Tarek Auel <tarek.a...@gmail.com> wrote: > Hi, > > The graph is already there (GraphX) and has the two RDDs you described. My > question tries to get an idea, if the community thinks that it's a benefit > and would be a plus or not. If yes, I would like to contribute it to GraphX > (either as part of GraphOpts or as external library). > > An interesting question is for me, if these operators take the attributes > into account (and ignore the ids) or if the id has to match too. I believe > that ignoring the id and focus on the attributes is harder but more > powerful. If I think of a graph with a node for the country US for instance > and I want to merge a second graph, which contains a U.S. node too, I would > expect that these two nodes will be merged (ignoring the id). Is this > thought valid, or does it more sense to merge based on the id? > > Thanks for your inputs ! > > Tarek > On Tue 2 Jun 2015 at 11:58 pm Reynold Xin <r...@databricks.com> wrote: > >> Hi Tarek, >> >> I took a quick look at the materials you shared. It actually seems to me >> it'd be super easy to express a graph as two DataFrames: one for edges >> (srcid, dstid, and other edge attributes) and one for vertices (vid, and >> other vertex attributes). >> >> Then >> >> intersection is just >> >> edges1.intersect(edges2) >> >> >> "join" is just >> >> edges1.union(edges2).distinct >> >> >> >> >> On Tue, Jun 2, 2015 at 12:12 AM, Tarek Auel <tarek.a...@gmail.com> wrote: >> >>> Okay thanks for your feedback. >>> >>> What is the expected behavior of union? Like Union and/or union all of >>> SQL? Union all would be more or less trivial if we just concatenate the >>> vertices and edges (vertex Id conflicts have to be resolved). Should union >>> look for duplicates on the actual attribute (VD) or just the vertex Id? If >>> it compares the attribute it might be necessary to change the id of some >>> vertices in order to resolve conflicts. >>> >>> Already a big thanks for your inputs ! >>> >>> On Mon 1 Jun 2015 at 11:55 pm Ankur Dave <ankurd...@gmail.com> wrote: >>> >>>> I think it would be good to have more basic operators like union or >>>> difference, as long as they have an efficient distributed implementation >>>> and are plausibly useful. >>>> >>>> If they can be written in terms of the existing GraphX API, it would be >>>> best to put them into GraphOps to keep the core GraphX implementation >>>> small. The `mask` operation should actually be in GraphOps -- it's only in >>>> GraphImpl for historical reasons. On the other hand, `subgraph` needs to be >>>> in GraphImpl for performance: it accesses EdgeRDDImpl#filter(epred, vpred), >>>> which can't be a public EdgeRDD method because its semantics rely on an >>>> implementation detail (vertex replication). >>>> >>>> Ankur <http://www.ankurdave.com/> >>>> >>>> On Mon, Jun 1, 2015 at 8:54 AM, Tarek Auel <tarek.a...@gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> Someone proposed in a Jira issue to implement new graph operations. >>>>> Sean Owen recommended to check first with the mailing list, if this is >>>>> interesting or not. >>>>> >>>>> So I would like to know, if it is interesting for GraphX to implement >>>>> the operators like: >>>>> http://en.wikipedia.org/wiki/Graph_operations and/or >>>>> http://techieme.in/complex-graph-operations/ >>>>> >>>>> If yes, should they be integrated into GraphImpl (like mask, subgraph >>>>> etc.) or as external library? My feeling is that they are similar to mask. >>>>> Because of consistency they should be part of the graph implementation >>>>> itself. >>>>> >>>>> What do you guys think? I really would like to bring GraphX forward >>>>> and help to implement some of these. >>>>> >>>>> Looking forward to hear your opinions >>>>> Tarek >>>>> >>>>> >>>> >>