In R, its easy to split a data set into training, crossValidation, and test
set. Is there something like this in spark.ml? I am using python of now.
My real problem is I want to randomly select a relatively small data set to
do some initial data exploration. Its not clear to me how using spark I
could create a random sample from a large data set. I would prefer to sample
with out replacement.
I have not tried to use sparkR yet. I assume I would not be able to use the
caret package with spark ML
Kind regards
Andy
```{R}
inTrain <- createDataPartition(y=csv$classe, p=0.7, list=FALSE)
trainSetDF <- csv[inTrain,]
testSetDF <- csv[-inTrain,]
```