[DISCUSSION] Native S3 Filesystem in Apache Flink

Samrat Deb Tue, 14 Oct 2025 11:19:51 -0700

Hi All,

Poorvank (cc'ed) and I are writing to start a discussion about a potential
improvement for Flink, creating a new, native S3 filesystem independent of
Hadoop/Presto.


The goal of this proposal is to address several challenges related to
Flink's S3 integration, simplifying flink-s3-filesystem. If this discussion
gains positive traction, the next step would be to move forward with a
formalised FLIP.

The Challenges with the Current S3 Connectors
Currently, Flink offers two primary S3 filesystems, flink-s3-fs-hadoop[1]
and flink-s3-fs-presto[2]. While functional, this dual-connector approach
has few issues:

1. The flink-s3-fs-hadoop connector adds an additional dependency to
manage. Upgrades like AWS SDK v2 are more dependent on Hadoop/Presto to
support first and leverage in flink-s3-filesystem. Sometimes it's
restrictive to leverage features directly from the AWS SDK.

2. The flink-s3-fs-presto connector was introduced to mitigate the
performance issues of the Hadoop connector, especially for checkpointing.
However, it lacks a RecoverableWriter implementation.
Sometimes it's confusing for Flink users, highlighting the need for a
single, unified solution.

*Proposed Solution:*
A Native, Hadoop-Free S3 Filesystem

I propose we develop a new filesystem, let's call it flink-s3-fs-native,
built directly on the modern AWS SDK for Java v2. This approach would be
free of any Hadoop or Presto dependencies. I have done a small prototype to
validate [3]

This is motivated by trino<>s3 [4]. The Trino project successfully
undertook a similar migration, moving from Hadoop-based object storage
clients to their own native implementations.

The new Flink S3 filesystem would:

1. Provide a single, unified connector for all S3 interactions, from state
backends to sinks.

2. Implement a high-performance S3RecoverableWriter using S3's Multipart
Upload feature, ensuring exactly-once sink semantics.

3. Offer a clean, self-contained dependency, drastically simplifying setup
and eliminating external dependencies.

A Phased Migration Path
To ensure a smooth transition, we could adopt a phased approach on a very
high level :

Phase 1:
Introduce the new native S3 filesystem as an optional, parallel plugin.
This would allow for community testing and adoption without breaking
existing setups.

Phase 2:
Once the native connector achieves feature parity and proven stability, we
will update the documentation to recommend it as the default choice for all
S3 use cases.

Phase 3:
In a future major release, the legacy flink-s3-fs-hadoop and
flink-s3-fs-presto connectors could be formally deprecated, with clear
migration guides provided for users.

I would love to hear the community's thoughts on this.

A few questions to start the discussion:

1. What are the biggest pain points with the current S3 filesystem?

2. Are there any critical features from the Hadoop S3A client that are
essential to replicate in a native implementation?

3. Would a simplified, non-dependent S3 experience be a valuable
improvement for Flink use cases?


Cheers,
Samrat


[1]
https://github.com/apache/flink/tree/master/flink-filesystems/flink-s3-fs-hadoop
[2]
https://github.com/apache/flink/tree/master/flink-filesystems/flink-s3-fs-presto
[3] https://github.com/Samrat002/flink/pull/4
[4]  https://github.com/trinodb/trino/tree/master/lib/trino-filesystem-s3

[DISCUSSION] Native S3 Filesystem in Apache Flink

Reply via email to