Re: Using Flink Streaming to write to multiple output files in HDFS

Fabian Hueske Wed, 21 Oct 2015 05:38:43 -0700

There are also training slides and programming exercises (incl. reference
solutions) for the DataStream API at


--> http://dataartisans.github.io/flink-training/

Cheers, Fabian

2015-10-21 14:03 GMT+02:00 Aljoscha Krettek <[email protected]>:

> Hi,
> the documentation has a guide about the Streaming API:
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html
>
> This also contains a section about the rolling (HDFS) FileSystem sink:
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#hadoop-filesystem
>
> For blog entries I would suggest these:
>  -
> http://data-artisans.com/real-time-stream-processing-the-next-step-for-apache-flink/
>  -
> http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
>  - http://data-artisans.com/kafka-flink-a-practical-how-to/
>
> I don’t think we have an easy starter issues right now on the Streaming
> API. But some might come up in the future. :D
>
> Cheers,
> Aljoscha
> > On 21 Oct 2015, at 11:40, Andra Lungu <[email protected]> wrote:
> >
> > Hey guys,
> >
> > Long time, no see :). I recently started a new job and it involves
> > performing a set of real-time data analytics using Apache Kafka, Storm
> > and Flume.
> >
> > What happens, on a very high level, is that set of signals is
> > collected, stored into a Kafka topic and then Storm is used to filter
> > certain fields out or to enrich the fields with other
> > meta-information. Finally, Flume writes the output into mutiple HDFS
> > files depending on the date, hour etc.
> >
> > Now, I saw that Flink can play with a similar pipeline, but without
> > needing Flume for the writing to HDFS part (see
> > http://data-artisans.com/kafka-flink-a-practical-how-to/). Which
> > brings me to my question: jow does Flink handle writing to multiple
> > files in a streaming fashion? -until now, I was playing with batch and
> > writeAsCsv just took one file as a parameter-
> >
> > Next question: What are the prerequisites to deploy a Flink Streaming
> > job on a cluster? Yarn, HDFS, anything else?
> >
> > Final question, more of a request: I'd like to play around with Flink
> > Streaming to state whether it can substitute Storm in this use case
> > and whether it can outrun it :P. To this end, I'll need some starting
> > points: docs, blog posts, examples to read. Any input would be useful.
> >
> > I wanted to dig for a newbie task in the streaming area, but I could
> > not find one... can we think of something easy to get me started?
> >
> > Thanks! Hope you guys had fun at Flink Forward!
> > Andra
>
>

Re: Using Flink Streaming to write to multiple output files in HDFS

Reply via email to