Take a look at Kaggle competition datasets - https://www.kaggle.com/competitions
For svm there are a couple of ad click prediction datasets of pretty large size. For graph stuff the SNAP has large network data: https://snap.stanford.edu/data/ — Sent from Mailbox On Thu, Jul 3, 2014 at 3:25 PM, AlexanderRiggers <[email protected]> wrote: > Hello! > I want to play around with several different cluster settings and measure > performances for MLlib and GraphX and was wondering if anybody here could > hit me up with datasets for these applications from 5GB onwards? > I mostly interested in SVM and Triangle Count, but would be glad for any > help. > Best regards, > Alex > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Sample-datasets-for-MLlib-and-Graphx-tp8760.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.
