[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301867#comment-14301867 ]
ASF GitHub Bot commented on FLINK-377: -------------------------------------- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/202#discussion_r23955578 --- Diff: docs/python_programming_guide.md --- @@ -0,0 +1,600 @@ +--- +title: "Python Programming Guide" +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +* This will be replaced by the TOC +{:toc} + + +<a href="#top"></a> + +Introduction +------------ + +Analysis programs in Flink are regular programs that implement transformations on data sets +(e.g., filtering, mapping, joining, grouping). The data sets are initially created from certain +sources (e.g., by reading files, or from collections). Results are returned via sinks, which may for +example write the data to (distributed) files, or to standard output (for example the command line +terminal). Flink programs run in a variety of contexts, standalone, or embedded in other programs. +The execution can happen in a local JVM, or on clusters of many machines. + +In order to create your own Flink program, we encourage you to start with the +[program skeleton](#program-skeleton) and gradually add your own +[transformations](#transformations). The remaining sections act as references for additional +operations and advanced features. + + +Example Program +--------------- + +The following program is a complete, working example of WordCount. You can copy & paste the code +to run it locally. + +{% highlight python %} +from flink.plan.Environment import get_environment +from flink.plan.Constants import INT, STRING +from flink.functions.GroupReduceFunction import GroupReduceFunction + +class Adder(GroupReduceFunction): + def reduce(self, iterator, collector): + count, word = iterator.next() + count += sum([x[0] for x in iterator]) + collector.collect((count, word)) + +if __name__ == "__main__": + env = get_environment() + data = env.from_elements("Who's there?", + "I think I hear them. Stand, ho! Who's there?") + + data \ + .flat_map(lambda x: x.lower().split(), (INT, STRING)) \ + .group_by(1) \ + .reduce_group(Adder(), (INT, STRING), combinable=True) \ + .output() + + env.execute() +} --- End diff -- fixed > Create a general purpose framework for language bindings > -------------------------------------------------------- > > Key: FLINK-377 > URL: https://issues.apache.org/jira/browse/FLINK-377 > Project: Flink > Issue Type: Improvement > Reporter: GitHub Import > Assignee: Chesnay Schepler > Labels: github-import > Fix For: pre-apache > > > A general purpose API to run operators with arbitrary binaries. > This will allow to run Stratosphere programs written in Python, JavaScript, > Ruby, Go or whatever you like. > We suggest using Google Protocol Buffers for data serialization. This is the > list of languages that currently support ProtoBuf: > https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns > Very early prototype with python: > https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing > protobuf) > For Ruby: https://github.com/infochimps-labs/wukong > Two new students working at Stratosphere (@skunert and @filiphaase) are > working on this. > The reference binding language will be for Python, but other bindings are > very welcome. > The best name for this so far is "stratosphere-lang-bindings". > I created this issue to track the progress (and give everybody a chance to > comment on this) > ---------------- Imported from GitHub ---------------- > Url: https://github.com/stratosphere/stratosphere/issues/377 > Created by: [rmetzger|https://github.com/rmetzger] > Labels: enhancement, > Assignee: [filiphaase|https://github.com/filiphaase] > Created at: Tue Jan 07 19:47:20 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)