I would be happy to see some generators in Gelly for exactly the reasons you've mentioned. Its always difficult for me to get some testing data when running Flink on a new cluster ... so this would help me ;)
On Thu, Sep 24, 2015 at 11:03 AM, Vasiliki Kalavri < vasilikikala...@gmail.com> wrote: > Hi Greg, > > thank you for this proposal! > I think graph generators will be a very useful addition to Gelly. > > I'm not quite familiar with the state-of-the-art algorithms for distributed > graph generation. > I suppose that we could easily provide an efficient random graph generator > and I've also seen some work on parallel/distributed algorithms for R-MAT > [1, 2]. > Are you aware of similar work for Erdos-Reniy, Kronecker or other types of > graphs? > Another place we might want to look at is Giraph's Watts-Strogatz generator > [3]. > > Cheers, > Vasia. > > [1]: https://github.com/farkhor/PaRMAT/ > [2]: http://arxiv.org/pdf/1210.0187.pdf > [3]: > > https://giraph.apache.org/apidocs/org/apache/giraph/io/formats/WattsStrogatzVertexInputFormat.html > > > On 23 September 2015 at 19:49, Greg Hogan <c...@greghogan.com> wrote: > > > I would like to propose that Flink include a selection of graph > generators > > in Gelly. Generated graphs will be useful for performing scalability, > > stress, and regression testing as well as benchmarking and comparing > > algorithms, both for Flink users and developers. Generated data is > > infinitely scalable yet described by a few simple parameters and can > often > > substitute for user data or sharing large files when reporting issues. > > > > Spark's GraphX includes a modest GraphGenerators class [1]. > > > > The initial implementation would focus on Erdos-Renyi, R-Mat [2], and > > Kronecker [3] generators. > > > > A key consideration is that the graphs should be seedable and generate > the > > same Graph regardless of parallelism. > > > > Generated data is a complement to my proposed "Checksum method for > DataSet > > and Graph" [4]. > > > > [1] > > > > > http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.util.GraphGenerators$ > > [2] R-MAT: A Recursive Model for Graph Mining; > > http://snap.stanford.edu/class/cs224w-readings/chakrabarti04rmat.pdf > > [3] Kronecker graphs: An Approach to Modeling Networks; > > http://arxiv.org/pdf/0812.4905v2.pdf > > [4] https://issues.apache.org/jira/browse/FLINK-2716 > > > > Greg Hogan > > >