Re: Adding a part suffix setter to the BucketingSink

2018-03-29 Thread lrao
Sorry, I meant "I don't see a way of doing this apart from setting a part file *suffix* with the required file extension. " On 2018/03/29 14:55:43, l...@lyft.com wrote: > Currently the BucketingSink allows addition of part prefix, pending > prefix/suffix and in-progress prefix/suffix via sett

Adding a part suffix setter to the BucketingSink

2018-03-29 Thread lrao
Currently the BucketingSink allows addition of part prefix, pending prefix/suffix and in-progress prefix/suffix via setter methods. Can we also support setting part suffixes? An instance where this maybe useful: I am currently writing GZIP compressed output to S3 using the BucketingSink and I wo

Re: Compressing files with the Bucketing Sink

2018-03-29 Thread lrao
Thanks a lot for the suggestion Till! I ended up using your suggestion of extending StreamWriterBase and wrapping the FSDataOutputStream with GZIPOutputStream. On 2018/03/28 09:44:26, Till Rohrmann wrote: > Hi, > > the SequenceFileWriter and the AvroKeyValueSinkWriter both support > co

Compressing files with the Bucketing Sink

2018-03-27 Thread lrao
I want to upload a compressed file (gzip preferrably) using the Bucketing Sink. What is the best way to do this? Would I have to implement my own Writer that does the compression? Has anyone done something similar?

Re: Correct way to reference Hadoop dependencies in Flink 1.4.0

2018-03-14 Thread lrao
Hi, I am running this on a Hadoop-free cluster (i.e. no YARN etc.). I have the following dependencies packaged in my user application JAR: aws-java-sdk 1.7.4 flink-hadoop-fs 1.4.0 flink-shaded-hadoop2 1.4.0 flink-connector-filesystem_2.11 1.4.0 hadoop-common 2.7.4 hadoop-aws 2.7.4 I have also t

Correct way to reference Hadoop dependencies in Flink 1.4.0

2018-03-14 Thread lrao
I'm trying to use a BucketingSink to write files to S3 in my Flink job. I have the Hadoop dependencies I need packaged in my user application jar. However, on running the job I get the following error (from the taskmanager): java.lang.RuntimeException: Error while creating FileSystem when initia

Re: Using the BucketingSink with Flink 1.4.0

2018-03-09 Thread lrao
Thank you for your responses Stephan and Piotrek! It's great to know that the hadoop-free Bucketing Sink might be available as early as 1.5.x! In the meantime, I have been trying workarounds but I am currently facing issues making it work. I tried including my Hadoop dependencies only in my use

Using the BucketingSink with Flink 1.4.0

2018-03-08 Thread lrao
I want to use the BucketingSink in the hadoop-free Flink system (i.e. 1.4.0) but currently I am kind of blocked because of its dependency on the Hadoop file system. 1. Is this something that's going to be fixed in the next version of Flink? 2. In the meantime, to unblock myself, what is the bes