spark ml Dataframe vs Labeled Point RDD Mllib speed

2016-01-18 Thread jarias
when loading from a text file. I'm sorry if I'm just messing some concepts from the documentation, but after an intensive experimentation I don't really see a clear strategy to use these different elements. Any thoughts would be really appreciated :) Cheers, jarias -- View this

saveAsTextFile creates an empty folder in HDFS

2015-10-02 Thread jarias
parallelize(l) scala> dist.saveAsTextFile("hdfs://node1.i3a.info/user/jarias/test/") 15/10/02 10:19:22 INFO FileOutputCommitter: File Output Committer Algorithm version is 1 15/10/02 10:19:22 INFO SparkContext: Starting job: saveAsTextFile at :27 15/10/02 10:19:22 INFO DAGSche