I am happy to announce that the 2.52.0 release of Beam has been finalized.
This release includes both improvements and new functionality.

For more information on changes in 2.52.0, check out the detailed release
notes - https://github.com/apache/beam/milestone/16. Here is an overview of
the changes in the release.

Highlights

* Previously deprecated Avro-dependent code (Beam Release 2.46.0) has been
finally removed from Java SDK "core" package. Please, use
`beam-sdks-java-extensions-avro` instead. This will allow to easily update
Avro version in user code without potential breaking changes in Beam "core"
since the Beam Avro extension already supports the latest Avro versions and
should handle this. (https://github.com/apache/beam/issues/25252).
* Publishing Java 21 SDK container images now supported as part of Apache
Beam release process. (https://github.com/apache/beam/issues/28120)
  * Direct Runner and Dataflow Runner support running pipelines on Java21
(experimental until tests fully setup). For other runners (Flink, Spark,
Samza, etc) support status depend on runner projects.

New Features / Improvements

* Add `UseDataStreamForBatch` pipeline option to the Flink runner. When it
is set to true, Flink runner will run batch jobs using the DataStream API.
By default the option is set to false, so the batch jobs are still executed
using the DataSet API.
* `upload_graph` as one of the Experiments options for DataflowRunner is no
longer required when the graph is larger than 10MB for Java SDK (
https://github.com/apache/beam/pull/28621).
* state amd side input cache has been enabled to a default of 100 MB. Use
`--max_cache_memory_usage_mb=X` to provide cache size for the user state
API and side inputs. (Python) (https://github.com/apache/beam/issues/28770).
* Beam YAML stable release. Beam pipelines can now be written using YAML
and leverage the Beam YAML framework which includes a preliminary set of
IO's and turnkey transforms. More information can be found in the YAML root
folder and in the (
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/yaml/README.md
).

Breaking Changes

* `org.apache.beam.sdk.io.CountingSource.CounterMark` uses custom
`CounterMarkCoder` as a default coder since all Avro-dependent classes
finally moved to `extensions/avro`. In case if it's still required to use
`AvroCoder` for `CounterMark`, then, as a workaround, a copy of "old"
`CountingSource` class should be placed into a project code and used
directly
(https://github.com/apache/beam/issues/25252).
* Renamed `host` to `firestoreHost` in `FirestoreOptions` to avoid
potential conflict of command line arguments (Java) (
https://github.com/apache/beam/pull/29201).

Bugfixes

* Fixed "Desired bundle size 0 bytes must be greater than 0" in Java SDK's
BigtableIO.BigtableSource when you have more cores than bytes to read
(Java) (https://github.com/apache/beam/issues/28793).
* `watch_file_pattern` arg of the RunInference arg had no effect prior to
2.52.0. To use the behavior of arg `watch_file_pattern` prior to 2.52.0,
follow the documentation at
https://beam.apache.org/documentation/ml/side-input-updates/ and use
`WatchFilePattern` PTransform as a SideInput. (
https://github.com/apache/beam/pulls/28948)
* `MLTransform` doesn't output artifacts such as min, max and quantiles.
Instead, `MLTransform` will add a feature to output these artifacts as
human readable format - (https://github.com/apache/beam/issues/29017). For
now, to use the artifacts such as min and max that were produced by the
eariler `MLTransform`, use `read_artifact_location` of `MLTransform`, which
reads artifacts that were produced earlier in a different `MLTransform` (
https://github.com/apache/beam/pull/29016/)
* Fixed a memory leak, which affected some long-running Python pipelines: (
https://github.com/apache/beam/issues/28246).

Security Fixes

* Fixed CVE-2023-39325 - (https://www.cve.org/CVERecord?id=CVE-2023-39325)
(Java/Python/Go) (https://github.com/apache/beam/issues/29118).
* Mitigated CVE-2023-47248 - (
https://nvd.nist.gov/vuln/detail/CVE-2023-47248)  (Python) (
https://github.com/apache/beam/issues/29392).

Thanks,
Danny

Reply via email to