There are some convenience methods you might consider including:

           MLUtils.loadLibSVMFile

and   MLUtils.loadLabeledPoint

2015-03-26 14:16 GMT-07:00 Ulanov, Alexander <alexander.ula...@hp.com>:

> Hi,
>
> Could you suggest what would be the reasonable file format to store
> feature vector data for machine learning in Spark MLlib? Are there any best
> practices for Spark?
>
> My data is dense feature vectors with labels. Some of the requirements are
> that the format should be easy loaded/serialized, randomly accessible, with
> a small footprint (binary). I am considering Parquet, hdf5, protocol buffer
> (protobuf), but I have little to no experience with them, so any
> suggestions would be really appreciated.
>
> Best regards, Alexander
>

Reply via email to