Left outer join

2016-11-08 Thread Thomas FOURNIER
Hello, I'm facing an issue with leftOuterJoin: - input is a DataSet[String] - metrics is a DataSet[(String,Long)] I'm doing a leftOuterJoin like this: input .leftOuterJoin(metrics) .where(0) .equalTo(0) { (left,right) => ... } But I encounter the following error: Specifying keys via

"Cannot resolve map"

2016-11-04 Thread Thomas FOURNIER
Hello, In the following code, map { case (id,(label, count)) => (label,id) } is not resolved. Is it related to zipWithIndex (org.apache.flink.api.scala) operation ? My input is a DataSet[String] and I'd like to output a DataSet[(String,Long)] val mapping = input .map( (s => (s, 1)) ) .gro

Assign a unique id to each line of a dataset

2016-11-02 Thread Thomas FOURNIER
Hello, Is it possible with the current Flink-API to give a unique id to each line of a dataset ? More precisely, I've globally sorted my Dataset with partitionByRange and I'd like to perform a kind of "zipWithIndex" operation, so that I can retrieve a Map (such as a collectAsMap with Spark). Tha

[jira] [Created] (FLINK-4964) FlinkML - Add StringIndexer

2016-10-29 Thread Thomas FOURNIER (JIRA)
Thomas FOURNIER created FLINK-4964: -- Summary: FlinkML - Add StringIndexer Key: FLINK-4964 URL: https://issues.apache.org/jira/browse/FLINK-4964 Project: Flink Issue Type: New Feature

[jira] [Created] (FLINK-4880) FlinkML - Implement Feature hashing (Data pre-processing)

2016-10-21 Thread Thomas FOURNIER (JIRA)
Thomas FOURNIER created FLINK-4880: -- Summary: FlinkML - Implement Feature hashing (Data pre-processing) Key: FLINK-4880 URL: https://issues.apache.org/jira/browse/FLINK-4880 Project: Flink

Re: Implicit class RichExecutionEnvironment - Can't use MlUtils.readLibSVM(path) in QUickStart guide

2016-10-20 Thread Thomas FOURNIER
tp://apache-flink-mailing-list-archive.1008284.n3. > nabble.com/jira-Created-FLINK-4792-Update-documentation- > QuickStart-FlinkML-td13936.html > > -- > Sent from a mobile device. May contain autocorrect errors. > > On Oct 20, 2016 2:06 PM, "Thomas FOURNIER" > wrot

Re: FlinkML - Evaluate function should manage LabeledVector

2016-10-20 Thread Thomas FOURNIER
Done here: FLINK-4865 <https://issues.apache.org/jira/browse/FLINK-4865> 2016-10-20 14:07 GMT+02:00 Thomas FOURNIER : > Ok thanks. > > I'm going to create a specific JIRA on this. Ok ? > > 2016-10-20 12:54 GMT+02:00 Theodore Vasiloudis < > theodoros.vasilou...@gma

[jira] [Created] (FLINK-4865) FlinkML - Add EvaluateDataSet operation for LabeledVector

2016-10-20 Thread Thomas FOURNIER (JIRA)
Thomas FOURNIER created FLINK-4865: -- Summary: FlinkML - Add EvaluateDataSet operation for LabeledVector Key: FLINK-4865 URL: https://issues.apache.org/jira/browse/FLINK-4865 Project: Flink

Re: FlinkML - Evaluate function should manage LabeledVector

2016-10-20 Thread Thomas FOURNIER
} > } > } > } > > I'm not a fan of casting objects, but the compiler complains here > otherwise. > > Maybe someone has some input as to why the casting is necessary here, given > that the underlying types are correct? Probably has to do with some type &g

Implicit class RichExecutionEnvironment - Can't use MlUtils.readLibSVM(path) in QUickStart guide

2016-10-20 Thread Thomas FOURNIER
Hello, Following QuickStart guide in FlinkML, I have to do the following: val astroTrain:DataSet[LabeledVector] = MLUtils.readLibSVM(env, "src/main/resources/svmguide1") Instead of: val astroTrain:DataSet[LabeledVector] = MLUtils.readLibSVM( "src/main/resources/svmguide1") Nonetheless, this i

Re: FlinkML - Evaluate function should manage LabeledVector

2016-10-19 Thread Thomas FOURNIER
valuate here, you should be creating an > EvaluateDataSet operation that works with LabeledVector, I see you are > creating a new PredictOperation. > > On Wed, Oct 19, 2016 at 3:05 PM, Thomas FOURNIER < > thomasfournier...@gmail.com> wrote: > > > Hi, > > >

FlinkML - Evaluate function should manage LabeledVector

2016-10-19 Thread Thomas FOURNIER
Hi, I'd like to improve SVM evaluate function so that it can use LabeledVector (and not only Vector). Indeed, what is done in test is the following (data is a DataSet[LabeledVector]): val test = data.map(l => (l.vector, l.label)) svm.evaluate(test) We would like to do: sm.evaluate(data) Adding

[jira] [Created] (FLINK-4850) FlinkML - SVM predict Operation for Vector and not LaveledVector

2016-10-18 Thread Thomas FOURNIER (JIRA)
Thomas FOURNIER created FLINK-4850: -- Summary: FlinkML - SVM predict Operation for Vector and not LaveledVector Key: FLINK-4850 URL: https://issues.apache.org/jira/browse/FLINK-4850 Project: Flink

[jira] [Created] (FLINK-4846) FlinkML - Pass "env" has an implicit parameter in MLUtils.readLibSVM

2016-10-17 Thread Thomas FOURNIER (JIRA)
Thomas FOURNIER created FLINK-4846: -- Summary: FlinkML - Pass "env" has an implicit parameter in MLUtils.readLibSVM Key: FLINK-4846 URL: https://issues.apache.org/jira/browse/FLINK-4846

[jira] [Created] (FLINK-4792) Update documentation - QuickStart - FlinkML

2016-10-10 Thread Thomas FOURNIER (JIRA)
Thomas FOURNIER created FLINK-4792: -- Summary: Update documentation - QuickStart - FlinkML Key: FLINK-4792 URL: https://issues.apache.org/jira/browse/FLINK-4792 Project: Flink Issue Type

[jira] [Created] (FLINK-4790) FlinkML - Error of type while using readCsvFile

2016-10-10 Thread Thomas FOURNIER (JIRA)
Thomas FOURNIER created FLINK-4790: -- Summary: FlinkML - Error of type while using readCsvFile Key: FLINK-4790 URL: https://issues.apache.org/jira/browse/FLINK-4790 Project: Flink Issue Type