Done here: FLINK-4865 <https://issues.apache.org/jira/browse/FLINK-4865>
2016-10-20 14:07 GMT+02:00 Thomas FOURNIER <thomasfournier...@gmail.com>: > Ok thanks. > > I'm going to create a specific JIRA on this. Ok ? > > 2016-10-20 12:54 GMT+02:00 Theodore Vasiloudis < > theodoros.vasilou...@gmail.com>: > >> I think this might be problematic with the current way we define the >> predict operations because they require that both the Testing and >> PredictionValue types are available. >> >> Here's what I had to do to get it to work (in >> ml/pipeline/Predictor.scala): >> >> import org.apache.flink.ml.math.{Vector => FlinkVector} >> implicit def labeledVectorEvaluateDataSetOperation[ >> Instance <: Estimator[Instance], >> Model, >> FlinkVector, >> Double]( >> implicit predictOperation: PredictOperation[Instance, Model, >> FlinkVector, Double], >> testingTypeInformation: TypeInformation[FlinkVector], >> predictionValueTypeInformation: TypeInformation[Double]) >> : EvaluateDataSetOperation[Instance, LabeledVector, Double] = { >> new EvaluateDataSetOperation[Instance, LabeledVector, Double] { >> override def evaluateDataSet( >> instance: Instance, >> evaluateParameters: ParameterMap, >> testing: DataSet[LabeledVector]) >> : DataSet[(Double, Double)] = { >> val resultingParameters = instance.parameters ++ evaluateParameters >> val model = predictOperation.getModel(instance, >> resultingParameters) >> >> implicit val resultTypeInformation = >> createTypeInformation[(FlinkVector, Double)] >> >> testing.mapWithBcVariable(model){ >> (element, model) => { >> (element.label.asInstanceOf[Double], >> predictOperation.predict(element.vector.asInstanceOf[FlinkVector], >> model)) >> } >> } >> } >> } >> } >> >> I'm not a fan of casting objects, but the compiler complains here >> otherwise. >> >> Maybe someone has some input as to why the casting is necessary here, >> given >> that the underlying types are correct? Probably has to do with some type >> erasure I'm not seeing here. >> >> --Theo >> >> On Wed, Oct 19, 2016 at 10:30 PM, Thomas FOURNIER < >> thomasfournier...@gmail.com> wrote: >> >> > Hi, >> > >> > Two questions: >> > >> > 1- I was thinking of doing this: >> > >> > implicit def evaluateLabeledVector[T <: LabeledVector] = { >> > >> > new EvaluateDataSetOperation[SVM,T,Double]() { >> > >> > override def evaluateDataSet(instance: SVM, evaluateParameters: >> > ParameterMap, testing: DataSet[T]): DataSet[(Double, Double)] = { >> > val predictor = ... >> > testing.map(l => (l.label, predictor.predict(l.vector))) >> > >> > } >> > } >> > } >> > >> > How can I access to my predictor object (predictor has type >> > PredictOperation[SVM, DenseVector, T, Double]) ? >> > >> > 2- My first idea was to develop a predictOperation[T <: LabeledVector] >> > so that I could use implicit def defaultEvaluateDatasetOperation >> > >> > to get an EvaluateDataSetOperationObject. Is it also valid or not ? >> > >> > Thanks >> > Regards >> > >> > Thomas >> > >> > >> > >> > >> > >> > >> > >> > 2016-10-19 16:26 GMT+02:00 Theodore Vasiloudis < >> > theodoros.vasilou...@gmail.com>: >> > >> > > Hello Thomas, >> > > >> > > since you are calling evaluate here, you should be creating an >> > > EvaluateDataSet operation that works with LabeledVector, I see you are >> > > creating a new PredictOperation. >> > > >> > > On Wed, Oct 19, 2016 at 3:05 PM, Thomas FOURNIER < >> > > thomasfournier...@gmail.com> wrote: >> > > >> > > > Hi, >> > > > >> > > > I'd like to improve SVM evaluate function so that it can use >> > > LabeledVector >> > > > (and not only Vector). >> > > > Indeed, what is done in test is the following (data is a >> > > > DataSet[LabeledVector]): >> > > > >> > > > val test = data.map(l => (l.vector, l.label)) >> > > > svm.evaluate(test) >> > > > >> > > > We would like to do: >> > > > sm.evaluate(data) >> > > > >> > > > >> > > > Adding this "new" code: >> > > > >> > > > implicit def predictLabeledPoint[T <: LabeledVector] = { >> > > > new PredictOperation ... >> > > > } >> > > > >> > > > gives me a predictOperation that should be used with >> > > > defaultEvaluateDataSetOperation >> > > > with the correct signature (ie with T <: LabeledVector and not T<: >> > > Vector). >> > > > >> > > > Nonetheless, tests are failing: >> > > > >> > > > >> > > > it should "predict with LabeledDataPoint" in { >> > > > >> > > > val env = ExecutionEnvironment.getExecutionEnvironment >> > > > >> > > > val svm = SVM(). >> > > > setBlocks(env.getParallelism). >> > > > setIterations(100). >> > > > setLocalIterations(100). >> > > > setRegularization(0.002). >> > > > setStepsize(0.1). >> > > > setSeed(0) >> > > > >> > > > val trainingDS = env.fromCollection(Classification.trainingData) >> > > > svm.fit(trainingDS) >> > > > val predictionPairs = svm.evaluate(trainingDS) >> > > > >> > > > .... >> > > > } >> > > > >> > > > There is no PredictOperation defined for >> > > > org.apache.flink.ml.classification.SVM which takes a >> > > > DataSet[org.apache.flink.ml.common.LabeledVector] as input. >> > > > java.lang.RuntimeException: There is no PredictOperation defined for >> > > > org.apache.flink.ml.classification.SVM which takes a >> > > > DataSet[org.apache.flink.ml.common.LabeledVector] as input. >> > > > >> > > > >> > > > >> > > > Thanks >> > > > >> > > > Regards >> > > > Thomas >> > > > >> > > >> > >> > >