Re: Using the BucketingSink with Flink 1.4.0

Stephan Ewen Fri, 09 Mar 2018 03:14:43 -0800

Hi!

Yes, the bucketing sink is unfortunately still tied to some specific Hadoop
file systems, due to a special way of using truncate() and append().


This is very high up our list post the 1.5 release, possibly even
backportable to 1.5.x.

The plan is to create a new Bucketing Sink based on Flink's file systems,
meaning it can also work with Hadoop-free Flink when using file:// s3:// or
so.

Best,
Stephan


On Fri, Mar 9, 2018 at 9:43 AM, Piotr Nowojski <pi...@data-artisans.com>
wrote:

> Hi,
>
> There is an quite old ticket about this issue. Feel free to bump it in the
> comment to rise it’s priority.
>
> https://issues.apache.org/jira/browse/FLINK-5789 <
> https://issues.apache.org/jira/browse/FLINK-5789>
>
> Regarding a walk around, maybe someone else will know more. There was a
> similar discussion on this topic which:
>
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/hadoop-free-hdfs-config-td17693.html <
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/hadoop-free-hdfs-config-td17693.html>
>
> Piotrek
>
> > On 9 Mar 2018, at 02:11, l...@lyft.com wrote:
> >
> > I want to use the BucketingSink in the hadoop-free Flink system (i.e.
> 1.4.0) but currently I am kind of blocked because of its dependency on the
> Hadoop file system.
> > 1. Is this something that's going to be fixed in the next version of
> Flink?
> > 2. In the meantime, to unblock myself, what is the best way forward? I
> have tried packaging the hadoop dependencies I need in my user jar but I
> run into problems while running the job. Stacktrace below:
> > ```
> > 21:26:09.654 INFO  o.a.f.r.t.Task - Source: source -> Sink: S3-Sink
> (1/1) (9ac2cb1fc2b913c3b9d75aace08bcd37) switched from RUNNING to FAILED.
> > java.lang.RuntimeException: Error while creating FileSystem when
> initializing the state of the BucketingSink.
> >        at org.apache.flink.streaming.connectors.fs.bucketing.
> BucketingSink.initializeState(BucketingSink.java:358)
> >        at org.apache.flink.streaming.util.functions.
> StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
> >        at org.apache.flink.streaming.util.functions.
> StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:
> 160)
> >        at org.apache.flink.streaming.api.operators.
> AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.
> java:96)
> >        at org.apache.flink.streaming.api.operators.
> AbstractStreamOperator.initializeState(AbstractStreamOperator.java:259)
> >        at org.apache.flink.streaming.runtime.tasks.StreamTask.
> initializeOperators(StreamTask.java:694)
> >        at org.apache.flink.streaming.runtime.tasks.StreamTask.
> initializeState(StreamTask.java:682)
> >        at org.apache.flink.streaming.runtime.tasks.StreamTask.
> invoke(StreamTask.java:253)
> >        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
> >        at java.lang.Thread.run(Thread.java:748)
> > Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> Could not find a file system implementation for scheme 'hdfs'. The scheme
> is not directly supported by Flink and no Hadoop file system to support
> this scheme could be loaded.
> >        at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
> FileSystem.java:405)
> >        at org.apache.flink.streaming.connectors.fs.bucketing.
> BucketingSink.createHadoopFileSystem(BucketingSink.java:1154)
> >        at org.apache.flink.streaming.connectors.fs.bucketing.
> BucketingSink.initFileSystem(BucketingSink.java:411)
> >        at org.apache.flink.streaming.connectors.fs.bucketing.
> BucketingSink.initializeState(BucketingSink.java:355)
> >        ... 9 common frames omitted
> > Caused by: org.apache.flink.core.fs.UnsupportedFileSystemSchemeException:
> Hadoop is not in the classpath/dependencies.
> >        at org.apache.flink.core.fs.UnsupportedSchemeFactory.create(
> UnsupportedSchemeFactory.java:64)
> >        at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(
> FileSystem.java:401)
> >        ... 12 common frames omitted
> > ```
>
>

Re: Using the BucketingSink with Flink 1.4.0

Reply via email to