Try this: https://www.dropbox.com/s/xf34l0ta496bdsn/tttt.txt

This code:

    println(g.numEdges)
    println(g.numVertices)
    println(g.edges.distinct().count())

gave me

10000
9294
2



On Tue, Apr 22, 2014 at 5:14 PM, Ankur Dave <ankurd...@gmail.com> wrote:
> I wasn't able to reproduce this with a small test file, but I did change the
> file parsing to use x(1).toLong instead of x(2).toLong. Did you mean to take
> the third column rather than the second?
>
> If so, would you mind posting a larger sample of the file, or even the whole
> file if possible?
>
> Here's the test that succeeded:
>
>   test("graph.edges.distinct.count") {
>     withSpark { sc =>
>       val edgeFullStrRDD: RDD[String] = sc.parallelize(List(
>         "394365859\t136153151", "589404147\t1361045425"))
>       val edgeTupRDD = edgeFullStrRDD.map(x => x.split("\t"))
>         .map(x => (x(0).toLong, x(1).toLong))
>       val g = Graph.fromEdgeTuples(edgeTupRDD, defaultValue = 123,
>         uniqueEdges = Option(CanonicalRandomVertexCut))
>       assert(edgeTupRDD.distinct.count() === 2)
>       assert(g.numEdges === 2)
>       assert(g.edges.distinct.count() === 2)
>     }
>   }
>
> Ankur

Reply via email to