There are some convenience methods you might consider including: MLUtils.loadLibSVMFile
and MLUtils.loadLabeledPoint 2015-03-26 14:16 GMT-07:00 Ulanov, Alexander <alexander.ula...@hp.com>: > Hi, > > Could you suggest what would be the reasonable file format to store > feature vector data for machine learning in Spark MLlib? Are there any best > practices for Spark? > > My data is dense feature vectors with labels. Some of the requirements are > that the format should be easy loaded/serialized, randomly accessible, with > a small footprint (binary). I am considering Parquet, hdf5, protocol buffer > (protobuf), but I have little to no experience with them, so any > suggestions would be really appreciated. > > Best regards, Alexander >