[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559938#comment-14559938 ]
ASF GitHub Bot commented on FLINK-1999: --------------------------------------- GitHub user rbraeunlich opened a pull request: https://github.com/apache/flink/pull/730 basic TfidfTransformer Hi everybody, due to [Flink-1999](https://issues.apache.org/jira/browse/FLINK-1999) we created a first implementation of a TfIdfTranformer. There is still one problem left, because using modulo after the hashing causes collisions. Nevertheless, we would be glad to receive some comments to our implementation. Cheers, Ronny You can merge this pull request into a Git repository by running: $ git pull https://github.com/rbraeunlich/flink tfidf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/730.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #730 ---- commit 9e9ac219b619ddfbab4f616165d038900b7726db Author: Ronny Bräunlich <r.braeunl...@gmail.com> Date: 2015-05-15T09:18:00Z create TfIdfTransformer commit 42ef7c00a832e21d7391e1011031bda162d930f1 Author: Ronny Bräunlich <r.braeunl...@gmail.com> Date: 2015-05-16T14:38:28Z fix import in TfIdfTranformer and add first basic test case commit 82385b764f45f955cd88590b7657467689d096ed Author: Ronny Bräunlich <r.braeunl...@gmail.com> Date: 2015-05-15T09:18:00Z create TfIdfTransformer and add first basic test case commit 7242728b1c24027203f1ff91476de9acb9bbf3a7 Author: diva1012 <vsldi...@gmail.com> Date: 2015-05-17T11:42:40Z Changes merged Merge remote-tracking branch 'rbraeunlich/tfidf' into tfidf Conflicts: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala commit 9c2c181624bb81f3ed83a4a774339251508644f1 Author: diva1012 <vsldi...@gmail.com> Date: 2015-05-17T17:40:00Z Small fix of the test class. (The Sparse vector contains index -> value tuples, so we have to take only the value and not the whole tuple for the comparisson) commit 8b17385e34b7f139a2649f80edc81744277fcfae Author: diva1012 <vsldi...@gmail.com> Date: 2015-05-18T06:41:58Z Word count implementation simplified. commit 229fac5f835ce05dd03544f7dd7c0df7952f18e9 Author: diva1012 <vsldi...@gmail.com> Date: 2015-05-18T11:35:43Z TF calculation fixed commit e1ea4437e42860d8ed7820c32e08d7a2d1152b08 Author: diva1012 <vsldi...@gmail.com> Date: 2015-05-19T20:44:31Z Transformer improved: now we get SparseVector for each document that contains all words. ---- > TF-IDF transformer > ------------------ > > Key: FLINK-1999 > URL: https://issues.apache.org/jira/browse/FLINK-1999 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Ronny Bräunlich > Assignee: Alexander Alexandrov > Priority: Minor > Labels: ML > > Hello everybody, > we are a group of three students from TU Berlin (I guess we're not the first > group creating an issue) and we want to/have to implement a tf-idf tranformer > for Flink. > Our lecturer Alexander told us that we could get some guidance here and that > you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)