[ https://issues.apache.org/jira/browse/FLINK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15636993#comment-15636993 ]
ASF GitHub Bot commented on FLINK-4964: --------------------------------------- Github user tfournier314 commented on the issue: https://github.com/apache/flink/pull/2740 I've changed my code so that I have now mapping:DataSet[(String,Long)] val mapping = input .mapWith( s => (s, 1) ) .groupBy( 0 ) .reduce( (a, b) => (a._1, a._2 + b._2) ) .partitionByRange( 1 ) .zipWithIndex .mapWith { case (id, (label, count)) => (label, id) } Parsing a new DataSet[String] called rawInput, I'd like to use this mapping and associate each "label" of rawInput an ID (which is the Long value of mapping). Is it possible with a streaming approach (need a join for example) ? > FlinkML - Add StringIndexer > --------------------------- > > Key: FLINK-4964 > URL: https://issues.apache.org/jira/browse/FLINK-4964 > Project: Flink > Issue Type: New Feature > Reporter: Thomas FOURNIER > Priority: Minor > > Add StringIndexer as described here: > http://spark.apache.org/docs/latest/ml-features.html#stringindexer > This will be added in package preprocessing of FlinkML -- This message was sent by Atlassian JIRA (v6.3.4#6332)