[ https://issues.apache.org/jira/browse/FLINK-7301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16112486#comment-16112486 ]
ASF GitHub Bot commented on FLINK-7301: --------------------------------------- Github user alpinegizmo commented on a diff in the pull request: https://github.com/apache/flink/pull/4441#discussion_r131086039 --- Diff: docs/dev/stream/state/custom_serialization.md --- @@ -0,0 +1,188 @@ +--- +title: "Custom Serialization for Managed State" +nav-title: "Custom Serialization" +nav-parent_id: streaming_state +nav-pos: 10 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +If your application uses Flink's managed state, it might be necessary to implement a custom serialization logic for special use cases. + +This page is targeted as a guideline for users who require the use of custom serialization for their state, covering how +to provide a custom serializer and how to handle upgrades to the serializer for compatibility. If you're simply using +Flink's own serializers, this page is irrelevant and can be skipped. + +### Using custom serializers + +As demonstrated in the above examples, when registering a managed operator or keyed state, a `StateDescriptor` is required +to specify the state's name, as well as information about the type of the state. The type information is used by Flink's +[type serialization framework](../../types_serialization.html) to create appropriate serializers for the state. + +It is also possible to completely bypass this and let Flink use your own custom serializer to serialize managed states, +simply by directly instantiating the `StateDescriptor` with your own `TypeSerializer` implementation: + +<div class="codetabs" markdown="1"> +<div data-lang="java" markdown="1"> +{% highlight java %} +public class CustomTypeSerializer extends TypeSerializer<Tuple2<String, Integer>> {...}; + +ListStateDescriptor<Tuple2<String, Integer>> descriptor = + new ListStateDescriptor<>( + "state-name", + new CustomTypeSerializer()); + +checkpointedState = getRuntimeContext().getListState(descriptor); +{% endhighlight %} +</div> + +<div data-lang="scala" markdown="1"> +{% highlight scala %} +class CustomTypeSerializer extends TypeSerializer[(String, Integer)] {...} + +val descriptor = new ListStateDescriptor[(String, Integer)]( + "state-name", + new CustomTypeSerializer) +) + +checkpointedState = getRuntimeContext.getListState(descriptor); +{% endhighlight %} +</div> +</div> + +Note that Flink writes state serializers along with the state as metadata. In certain cases on restore (see following +subsections), the written serializer needs to be deserialized and used. Therefore, it is recommended to avoid using +anonymous classes as your state serializers. Anonymous classes do not have a guarantee on the generated classname, +varying across compilers and depends on the order that they are instantiated within the enclosing class, which can --- End diff -- "varying across compilers and depends" ==> "which varies across compilers and depends" > Rework state documentation > -------------------------- > > Key: FLINK-7301 > URL: https://issues.apache.org/jira/browse/FLINK-7301 > Project: Flink > Issue Type: Improvement > Components: Documentation > Reporter: Timo Walther > Assignee: Timo Walther > > The documentation about state is spread across different pages, but this is > not consistent and it is hard to find what you need. I propose: > "Mention State Backends and link to them in ""Streaming/Working with State"". > Create category ""State & Fault Tolerance"" under ""Streaming"". Move > ""Working with State"", ""Checkpointing"" and ""Queryable State"". > Move API related parts (90%) of ""Deployment/State & Fault Tolerance/State > Backends"" to ""Streaming/State & Fault Tolerance/State Backends"". > Move all tuning things from ""Debugging/Large State"" to ""Deployment/State & > Fault Tolerance/State Backends"". > Move ""Streaming/Working with State/Custom Serialization for Managed State"" > to ""Streaming/State & Fault Tolerance/Custom Serialization"" (Add a link > from previous position, also link from ""Data Types & Serialization"")." -- This message was sent by Atlassian JIRA (v6.4.14#64029)