The Datasets is in a fairly popular data format called libsvm data format - popularized by the libsvm library.
http://svmlight.joachims.org - The 'How to Use' section describes the file format. XGBoost uses the same file format and their documentation is here - https://xgboost.readthedocs.io/en/latest/tutorials/input_format.html On Fri, Apr 5, 2019, 03:13 Serena S Yuan <supersia...@gmail.com> wrote: > Hi, > I am trying to use apache spark's decision tree classifier. I am > trying to implement the method found in > https://spark.apache.org/docs/1.5.1/ml-decision-tree.html 's > classification example. I found the dataset at > > https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt > and I have some trouble understanding its format. Is the first column > the label? Why are there indices and a colon in front of other number > values and what do the indices represent? > Lastly, how do I print out a prediction given new test data? > > Thanks, > Serena Sian Yuan > > -- > Sian Ees Super. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >