[jira] [Commented] (FLINK-4964) FlinkML - Add StringIndexer

ASF GitHub Bot (JIRA) Tue, 15 Nov 2016 09:06:35 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667671#comment-15667671
 ]


ASF GitHub Bot commented on FLINK-4964:
---------------------------------------

Github user thvasilo commented on the issue:

    https://github.com/apache/flink/pull/2740
  
    @greghogan Excuse my ignorance, I'm only now learning about Flink internals 
:)
    It seems like the issue here was that `partitionByRange` partitions keys in 
ascending order but we want the end result in descending order.
    
    @tfournier314 I think the following should work, here I use a key extractor 
to negate the value of the key to achieve the desired effect:
    
    ```Scala
    itData.map(s => (s,1))
          .groupBy(0)
          .sum(1)
          .partitionByRange(x => -x._2) // Take the negative count as the key
          .sortPartition(1, Order.DESCENDING)
          .zipWithIndex
    ```


> FlinkML - Add StringIndexer
> ---------------------------
>
>                 Key: FLINK-4964
>                 URL: https://issues.apache.org/jira/browse/FLINK-4964
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Thomas FOURNIER
>            Priority: Minor
>
> Add StringIndexer as described here:
> http://spark.apache.org/docs/latest/ml-features.html#stringindexer
> This will be added in package preprocessing of FlinkML



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-4964) FlinkML - Add StringIndexer

Reply via email to