Re: [PR] Native s3 Filesystem Blog [flink-web]

via GitHub Mon, 15 Jun 2026 01:24:58 -0700


davidradl commented on code in PR #860:
URL: https://github.com/apache/flink-web/pull/860#discussion_r3412013090



##########
docs/content/posts/2026-06-14-announcing-native-s3-fs.md:
##########
@@ -0,0 +1,200 @@
+---
+title:  "Introducing Flink's Native S3 FileSystem: Built for Performance, 
Designed for Production"
+date: "2026-06-14T08:00:00.000Z"
+slug: "announcing-native-s3-fs"
+url: "/2026/06/14/announcing-native-s3-fs/"
+authors:
+- gabor:
+  name: "Gabor Somogyi"
+- samrat:
+  name: "Samrat Deb"
+aliases:
+- /news/2026/06/14/announcing-native-s3-fs.html
+---
+
+Apache Flink relies on the underlying filesystem for much of its work: reading 
and writing application data, materializing streaming sinks, and storing 
checkpoints and savepoints for recovery. For years, S3 support in Flink meant 
choosing between two Hadoop-based plugins, each with its own trade-offs and 
configuration quirks. With Flink 2.3, there is a better option.
+
+Today we're introducing `flink-s3-fs-native`, A ground-up, Hadoop-free S3 
filesystem built specifically for Flink. It ships as an experimental opt-in 
plugin in Flink 2.3, is already running in production at scale at major 
technology companies, and delivers measurable, reproducible performance gains.
+
+
+**At a glance**
+
+| | |
+|---|---|
+| **~2x faster checkpoints** | 48.8 s average vs 90.1 s with the Presto 
plugin; up to 4.5x at small state sizes |
+| **Drop-in replacement** | Swap the JAR, keep your existing 
`flink-conf.yaml`, restart your cluster |
+| **No Hadoop dependency** | ~13 MB JAR vs ~30–93 MB; no CVE triage on Hadoop 
transitive dependencies |
+| **AWS SDK v2** | Async-first I/O; AWS SDK v1 entered maintenance mode 
December 2025 |
+| **One plugin for everything** | Exactly-once sinks and fast checkpoints — no 
trade-offs, no compromises |
+
+
+## Two Plugins, One Filesystem, and No Good Answer
+
+If you've configured S3 for Flink before, you likely know that Flink ships two 
S3 filesystem plugins, and both register on the same `s3://` scheme. Only one 
can be active at a time. Choosing between them has been a source of confusion 
for years.
+
+The **Hadoop plugin** wraps Hadoop's S3A client. It supports 
`RecoverableWriter`, which enables exactly-once sinks. Unfortunately it pulls 
in the full `hadoop-common` dependency tree and AWS SDK v1. Configuration uses 
Hadoop-native keys (`fs.s3a.*`) mirrored to Flink-style keys (`s3.*`) through a 
compatibility layer.
+
+The **Presto plugin** was historically recommended for checkpointing because 
of its faster read path. But it does not support `RecoverableWriter`, which 
means exactly-once file sinks don't work with it. It carries known [bugs around 
directory deletion](https://github.com/prestodb/presto/issues/17416) that 
require Flink-side workarounds. It also depends on `hadoop-common` and AWS SDK 
v1 under the hood.
+
+Both share a common base layer that adapts a Hadoop `FileSystem` into a Flink 
`FileSystem`. This adaptation layer adds indirection, limits Flink-specific 
optimizations, and ties the implementation to Hadoop's configuration model and 
SDK lifecycle.
+
+As a result, you could have exactly-once sinks or a lighter read path, but not 
both. In addition, you are carrying Hadoop dependency hell.

Review Comment:
   nit: hell -> challenges



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Native s3 Filesystem Blog [flink-web]

Reply via email to