[ https://issues.apache.org/jira/browse/FLINK-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728745#comment-14728745 ]
ASF GitHub Bot commented on FLINK-2583: --------------------------------------- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/1084#discussion_r38627730 --- Diff: docs/apis/streaming_guide.md --- @@ -1836,6 +1837,110 @@ More about information about Elasticsearch can be found [here](https://elastic.c [Back to top](#top) +### HDFS + +This connector provides a Sink that writes rolling files to HDFS. To use this connector, add the +following dependency to your project: + +{% highlight xml %} +<dependency> + <groupId>org.apache.flink</groupId> + <artifactId>flink-connector-hdfs</artifactId> + <version>{{site.version}}</version> +</dependency> +{% endhighlight %} + +Note that the streaming connectors are currently not part of the binary +distribution. See +[here](cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution) +for information about how to package the program with the libraries for +cluster execution. + +#### HDFS Rolling File Sink + +The rolling behaviour as well as the writing can be configured but we will get to that later. +This is how you can create a default rolling sink: + +<div class="codetabs" markdown="1"> +<div data-lang="java" markdown="1"> +{% highlight java %} +DataStream<String> input = ...; + +input.addSink(new RollingHDFSSink<String>("/base/path")); + +{% endhighlight %} +</div> +<div data-lang="scala" markdown="1"> +{% highlight scala %} +val input: DataStream[String] = ... + +input.addSink(new RollingHDFSSink("/base/path")) + +{% endhighlight %} +</div> +</div> + +The only required parameter is the base path in HDFS where the rolling files (buckets) will be +stored. The sink can be configured by specifying a custom bucketer, HDFS writer and batch size. + +By default the rolling sink will use the pattern `"yyyy-MM-dd--HH"` to name the rolling buckets. --- End diff -- Can you make it a bit more explicit that a new directory is created when the pattern changes? > Add Stream Sink For Rolling HDFS Files > -------------------------------------- > > Key: FLINK-2583 > URL: https://issues.apache.org/jira/browse/FLINK-2583 > Project: Flink > Issue Type: New Feature > Components: Streaming > Reporter: Aljoscha Krettek > Assignee: Aljoscha Krettek > Fix For: 0.10 > > > In addition to having configurable file-rolling behavior the Sink should also > integrate with checkpointing to make it possible to have exactly-once > semantics throughout the topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)