[ https://issues.apache.org/jira/browse/FLINK-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581612#comment-14581612 ]
ASF GitHub Bot commented on FLINK-2072: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/792#discussion_r32197018 --- Diff: docs/libs/ml/quickstart.md --- @@ -24,4 +25,214 @@ under the License. * This will be replaced by the TOC {:toc} -Coming soon. +## Introduction + +FlinkML is designed to make learning from your data a straight-forward process, abstracting away +the complexities that usually come with having to deal with big data learning tasks. In this +quick-start guide we will show just how easy it is to solve a simple supervised learning problem +using FlinkML. But first some basics, feel free to skip the next few lines if you're already +familiar with Machine Learning (ML). + +As defined by Murphy [1] ML deals with detecting patterns in data, and using those +learned patterns to make predictions about the future. We can categorize most ML algorithms into +two major categories: Supervised and Unsupervised Learning. + +* **Supervised Learning** deals with learning a function (mapping) from a set of inputs +(features) to a set of outputs. The learning is done using a *training set* of (input, +output) pairs that we use to approximate the mapping function. Supervised learning problems are +further divided into classification and regression problems. In classification problems we try to +predict the *class* that an example belongs to, for example whether a user is going to click on +an ad or not. Regression problems one the other hand, are about predicting (real) numerical +values, often called the dependent variable, for example what the temperature will be tomorrow. + +* **Unsupervised Learning** deals with discovering patterns and regularities in the data. An example +of this would be *clustering*, where we try to discover groupings of the data from the +descriptive features. Unsupervised learning can also be used for feature selection, for example +through [principal components analysis](https://en.wikipedia.org/wiki/Principal_component_analysis). + +## Linking with FlinkML + +In order to use FlinkML in you project, first you have to +[set up a Flink program](http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#linking-with-flink). +Next, you have to add the FlinkML dependency to the `pom.xml` of your project: + +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-ml</artifactId> + <version>{{site.version }}</version> --- End diff -- Nicely done with the site version :+1: > Add a quickstart guide for FlinkML > ---------------------------------- > > Key: FLINK-2072 > URL: https://issues.apache.org/jira/browse/FLINK-2072 > Project: Flink > Issue Type: New Feature > Components: Documentation, Machine Learning Library > Reporter: Theodore Vasiloudis > Assignee: Theodore Vasiloudis > Fix For: 0.9 > > > We need a quickstart guide that introduces users to the core concepts of > FlinkML to get them up and running quickly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)