Friends of GraphFrames (github.com/graphframes/graphframes), I have a
question for you...

I can't get the unit test 'two components and two dangling vertices' in the
org.graphframes.lib.ConnectedComponentsSuite
<https://github.com/graphframes/graphframes/blob/649094caf58cfda0eea3e8cd66785aa38104d771/src/test/scala/org/graphframes/lib/ConnectedComponentsSuite.scala#L138-L148>
to pass. It fails with an 'OutOfMemoryError: Java heap space' error. I am a
little stuck on completing a docs release with a motif finding tutorial
<https://github.com/graphframes/graphframes/pull/473> due to this issue.

The problem is outlined in this gist:
https://gist.github.com/rjurney/6abeffbd59c67df5e5243c8f6619b6bf

Can someone else please try this and see if it passes on the master branch?

> build/sbt clean compile package test

I've tried giving it lots of RAM just to see if it would help, as much as
32g driver and 16g for executors and... it has no effect. The test graph is 8
nodes and 6 edges
<https://gist.github.com/rjurney/6abeffbd59c67df5e5243c8f6619b6bf#file-connectedcomponentsuite-scala-L22-L26>,
so it shouldn't have a memory problem... yet when it runs, all 24 cores of
my CPU get used, it spikes as indicated in the image in the gist.

I am running the following setup:

* Ubuntu 20.04 (22.04 in the Docker image)
* OpenJDK 11 (I also tried 8, same problem)
* Scala 2.12.20 (I also tried 2.13, same problem)
* Python 3.11 (I also tried 3.9, same problem)

Or I am running the Dockerfile in the gist
<https://gist.github.com/rjurney/6abeffbd59c67df5e5243c8f6619b6bf#file-dockerfile>
.

Any help much appreciated! Thanks

-----------------------------------------------------------------
Oh, some new community stuff for GraphFrames. Hackathon announced next week
:)


   - GraphFrames Mailing List <https://groups.google.com/g/graphframes/>:
   ask questions about GraphFrames on our Google Group
   - #graphframes Discord Channel on GraphGeeks
   <https://discord.com/channels/1162999022819225631/1326257052368113674>

Thanks!
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

Reply via email to