[jira] [Commented] (BEAM-1439) Beam Example(s) exploring public document datasets

khalid bin huda (JIRA) Thu, 16 Mar 2017 14:28:06 -0700

    [ 
https://issues.apache.org/jira/browse/BEAM-1439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928932#comment-15928932
 ]


khalid bin huda commented on BEAM-1439:
---------------------------------------

Hi, I'm Khalid Bin Huda, I am a Final year undergraduate from the Department of 
Computer Science (University of Karachi). I have programming experience with C 
,Java ,R and love to work on Project related to Data-mining  or Machine 
Learning.  I would like do this project for GSoC 2017. I would like to 
contribute in this Project.

> Beam Example(s) exploring public document datasets
> --------------------------------------------------
>
>                 Key: BEAM-1439
>                 URL: https://issues.apache.org/jira/browse/BEAM-1439
>             Project: Beam
>          Issue Type: Wish
>          Components: examples-java
>            Reporter: Kenneth Knowles
>            Assignee: Kenneth Knowles
>            Priority: Minor
>              Labels: gsoc2017, java, mentor, python
>
> In Beam, we have examples illustrating counting the occurrences of words and 
> performing a basic TF-IDF analysis on the works of Shakespeare (or whatever 
> you point it at). It would be even cooler to do these analyses, and more, on 
> a much larger data set that is really the subject of current investigations.
> In chatting with professors at the University of Washington, I've learned 
> that scholars of many fields would really like to explore new and highly 
> customized ways of processing the growing body of publicly-available 
> scholarly documents, such as PubMed Central. Queries like "show me documents 
> where chemical compounds X and Y were both used in the 'method' section"
> So I propose a Google Summer of Code project wherein a student writes some 
> large-scale Beam pipelines to perform analyses such as term frequency, bigram 
> frequency, etc.
> Skills required:
>  - Java or Python
>  - (nice to have) Working through the Beam getting started materials



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-1439) Beam Example(s) exploring public document datasets

Reply via email to