Hi,

Could you suggest what would be the reasonable file format to store feature 
vector data for machine learning in Spark MLlib? Are there any best practices 
for Spark?

My data is dense feature vectors with labels. Some of the requirements are that 
the format should be easy loaded/serialized, randomly accessible, with a small 
footprint (binary). I am considering Parquet, hdf5, protocol buffer (protobuf), 
but I have little to no experience with them, so any suggestions would be 
really appreciated.

Best regards, Alexander

Reply via email to