[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559938#comment-14559938
 ] 

ASF GitHub Bot commented on FLINK-1999:
---------------------------------------

GitHub user rbraeunlich opened a pull request:

    https://github.com/apache/flink/pull/730

    basic TfidfTransformer

    Hi everybody,
    
    due to [Flink-1999](https://issues.apache.org/jira/browse/FLINK-1999) we 
created a first implementation of a TfIdfTranformer.
    There is still one problem left, because using modulo after the hashing 
causes collisions.
    Nevertheless, we would be glad to receive some comments to our 
implementation.
    
    Cheers,
    Ronny

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rbraeunlich/flink tfidf

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/730.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #730
    
----
commit 9e9ac219b619ddfbab4f616165d038900b7726db
Author: Ronny Bräunlich <r.braeunl...@gmail.com>
Date:   2015-05-15T09:18:00Z

    create TfIdfTransformer

commit 42ef7c00a832e21d7391e1011031bda162d930f1
Author: Ronny Bräunlich <r.braeunl...@gmail.com>
Date:   2015-05-16T14:38:28Z

    fix import in TfIdfTranformer and add first basic test case

commit 82385b764f45f955cd88590b7657467689d096ed
Author: Ronny Bräunlich <r.braeunl...@gmail.com>
Date:   2015-05-15T09:18:00Z

    create TfIdfTransformer and add first basic test case

commit 7242728b1c24027203f1ff91476de9acb9bbf3a7
Author: diva1012 <vsldi...@gmail.com>
Date:   2015-05-17T11:42:40Z

    Changes merged
    
    Merge remote-tracking branch 'rbraeunlich/tfidf' into tfidf
    
    Conflicts:
        
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala

commit 9c2c181624bb81f3ed83a4a774339251508644f1
Author: diva1012 <vsldi...@gmail.com>
Date:   2015-05-17T17:40:00Z

    Small fix of the test class. (The Sparse vector contains index -> value 
tuples, so we have to take only the value and not the whole tuple for the 
comparisson)

commit 8b17385e34b7f139a2649f80edc81744277fcfae
Author: diva1012 <vsldi...@gmail.com>
Date:   2015-05-18T06:41:58Z

    Word count implementation simplified.

commit 229fac5f835ce05dd03544f7dd7c0df7952f18e9
Author: diva1012 <vsldi...@gmail.com>
Date:   2015-05-18T11:35:43Z

    TF calculation fixed

commit e1ea4437e42860d8ed7820c32e08d7a2d1152b08
Author: diva1012 <vsldi...@gmail.com>
Date:   2015-05-19T20:44:31Z

    Transformer improved: now we get SparseVector for each document that 
contains all words.

----


> TF-IDF transformer
> ------------------
>
>                 Key: FLINK-1999
>                 URL: https://issues.apache.org/jira/browse/FLINK-1999
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Ronny Bräunlich
>            Assignee: Alexander Alexandrov
>            Priority: Minor
>              Labels: ML
>
> Hello everybody,
> we are a group of three students from TU Berlin (I guess we're not the first 
> group creating an issue) and we want to/have to implement a tf-idf tranformer 
> for Flink.
> Our lecturer Alexander told us that we could get some guidance here and that 
> you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to