Hi! Yes, the bucketing sink is unfortunately still tied to some specific Hadoop file systems, due to a special way of using truncate() and append().
This is very high up our list post the 1.5 release, possibly even backportable to 1.5.x. The plan is to create a new Bucketing Sink based on Flink's file systems, meaning it can also work with Hadoop-free Flink when using file:// s3:// or so. Best, Stephan On Fri, Mar 9, 2018 at 9:43 AM, Piotr Nowojski <pi...@data-artisans.com> wrote: > Hi, > > There is an quite old ticket about this issue. Feel free to bump it in the > comment to rise it’s priority. > > https://issues.apache.org/jira/browse/FLINK-5789 < > https://issues.apache.org/jira/browse/FLINK-5789> > > Regarding a walk around, maybe someone else will know more. There was a > similar discussion on this topic which: > > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/hadoop-free-hdfs-config-td17693.html < > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/hadoop-free-hdfs-config-td17693.html> > > Piotrek > > > On 9 Mar 2018, at 02:11, l...@lyft.com wrote: > > > > I want to use the BucketingSink in the hadoop-free Flink system (i.e. > 1.4.0) but currently I am kind of blocked because of its dependency on the > Hadoop file system. > > 1. Is this something that's going to be fixed in the next version of > Flink? > > 2. In the meantime, to unblock myself, what is the best way forward? I > have tried packaging the hadoop dependencies I need in my user jar but I > run into problems while running the job. Stacktrace below: > > ``` > > 21:26:09.654 INFO o.a.f.r.t.Task - Source: source -> Sink: S3-Sink > (1/1) (9ac2cb1fc2b913c3b9d75aace08bcd37) switched from RUNNING to FAILED. > > java.lang.RuntimeException: Error while creating FileSystem when > initializing the state of the BucketingSink. > > at org.apache.flink.streaming.connectors.fs.bucketing. > BucketingSink.initializeState(BucketingSink.java:358) > > at org.apache.flink.streaming.util.functions. > StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178) > > at org.apache.flink.streaming.util.functions. > StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java: > 160) > > at org.apache.flink.streaming.api.operators. > AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator. > java:96) > > at org.apache.flink.streaming.api.operators. > AbstractStreamOperator.initializeState(AbstractStreamOperator.java:259) > > at org.apache.flink.streaming.runtime.tasks.StreamTask. > initializeOperators(StreamTask.java:694) > > at org.apache.flink.streaming.runtime.tasks.StreamTask. > initializeState(StreamTask.java:682) > > at org.apache.flink.streaming.runtime.tasks.StreamTask. > invoke(StreamTask.java:253) > > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) > > at java.lang.Thread.run(Thread.java:748) > > Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: > Could not find a file system implementation for scheme 'hdfs'. The scheme > is not directly supported by Flink and no Hadoop file system to support > this scheme could be loaded. > > at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem( > FileSystem.java:405) > > at org.apache.flink.streaming.connectors.fs.bucketing. > BucketingSink.createHadoopFileSystem(BucketingSink.java:1154) > > at org.apache.flink.streaming.connectors.fs.bucketing. > BucketingSink.initFileSystem(BucketingSink.java:411) > > at org.apache.flink.streaming.connectors.fs.bucketing. > BucketingSink.initializeState(BucketingSink.java:355) > > ... 9 common frames omitted > > Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException: > Hadoop is not in the classpath/dependencies. > > at org.apache.flink.core.fs.UnsupportedSchemeFactory.create( > UnsupportedSchemeFactory.java:64) > > at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem( > FileSystem.java:401) > > ... 12 common frames omitted > > ``` > >