Hi, Could you suggest what would be the reasonable file format to store feature vector data for machine learning in Spark MLlib? Are there any best practices for Spark?
My data is dense feature vectors with labels. Some of the requirements are that the format should be easy loaded/serialized, randomly accessible, with a small footprint (binary). I am considering Parquet, hdf5, protocol buffer (protobuf), but I have little to no experience with them, so any suggestions would be really appreciated. Best regards, Alexander