The Datasets is in a fairly popular data format called libsvm data format -
popularized by the libsvm library.

http://svmlight.joachims.org - The 'How to Use' section describes the file
format.

XGBoost uses the same file format and their documentation is here -
https://xgboost.readthedocs.io/en/latest/tutorials/input_format.html


On Fri, Apr 5, 2019, 03:13 Serena S Yuan <supersia...@gmail.com> wrote:

> Hi,
>     I am trying to use apache spark's decision tree classifier. I am
> trying to implement the method found in
> https://spark.apache.org/docs/1.5.1/ml-decision-tree.html 's
> classification example. I found the dataset at
>
> https://github.com/apache/spark/blob/master/data/mllib/sample_libsvm_data.txt
> and I have some trouble understanding its format. Is the first column
> the label? Why are there indices and a colon in front of other number
> values and what do the indices represent?
>   Lastly, how do I print out a prediction given new test data?
>
> Thanks,
>    Serena Sian Yuan
>
> --
> Sian Ees Super.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to