[GitHub] [flink] ranqiqiang commented on a change in pull request #16478: [FLINK-23228][docs-zh] Translate "Stateful Stream Processing" page into Chinese

GitBox Wed, 14 Jul 2021 20:05:34 -0700


ranqiqiang commented on a change in pull request #16478:
URL: https://github.com/apache/flink/pull/16478#discussion_r670099338




##########
File path: docs/content.zh/docs/concepts/stateful-stream-processing.md
##########
@@ -24,342 +24,227 @@ under the License.
 
 # 有状态流处理
 
-## What is State?
+## 什么是状态？
 
-While many operations in a dataflow simply look at one individual *event at a
-time* (for example an event parser), some operations remember information
-across multiple events (for example window operators). These operations are
-called **stateful**.
+虽然数据流中的很多操作一次只着眼于一个单独的事件（例如事件解析器），但有些操作会记住多个事件的信息（例如窗口算子）。
+这些操作称为**有状态的**（stateful）。
 
-Some examples of stateful operations:
+有状态操作的一些示例：
 
-  - When an application searches for certain event patterns, the state will
-    store the sequence of events encountered so far.
-  - When aggregating events per minute/hour/day, the state holds the pending
-    aggregates.
-  - When training a machine learning model over a stream of data points, the
-    state holds the current version of the model parameters.
-  - When historic data needs to be managed, the state allows efficient access
-    to events that occurred in the past.
+  - 当应用程序搜索某些事件模式时，状态将存储到目前为止遇到的事件序列。
+  - 当每分钟/每小时/每天聚合事件时，状态会持有待处理的聚合。
+  - 当在数据点的流上训练一个机器学习模型时，状态会保存模型参数的当前版本。
+  - 当需要管理历史数据时，状态允许有效访问过去发生的事件。
 
-Flink needs to be aware of the state in order to make it fault tolerant using
+Flink 需要知道状态以便使用
 [checkpoints]({{< ref "docs/dev/datastream/fault-tolerance/checkpointing" >}})
-and [savepoints]({{< ref "docs/ops/state/savepoints" >}}).
+和 [savepoints]({{< ref "docs/ops/state/savepoints" >}}) 进行容错。
 
-Knowledge about the state also allows for rescaling Flink applications, meaning
-that Flink takes care of redistributing state across parallel instances.
+关于状态的知识也允许我们重新调节 Flink 应用程序，这意味着 Flink 负责跨并行实例重新分布状态。
 
-[Queryable state]({{< ref 
"docs/dev/datastream/fault-tolerance/queryable_state" >}}) allows you to access 
state from outside of Flink during runtime.
+[可查询的状态]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" 
>}})允许你在运行时从 Flink 外部访问状态。
 
-When working with state, it might also be useful to read about [Flink's state
-backends]({{< ref "docs/ops/state/state_backends" >}}). Flink
-provides different state backends that specify how and where state is stored.
+在使用状态时，阅读 [Flink 的状态后端]({{< ref "docs/ops/state/state_backends" >}})可能也很有用。 
+Flink 提供了不同的状态后端，用于指定状态存储的方式和位置。
 
 {{< top >}}
 
-## Keyed State
-
-Keyed state is maintained in what can be thought of as an embedded key/value
-store.  The state is partitioned and distributed strictly together with the
-streams that are read by the stateful operators. Hence, access to the key/value
-state is only possible on *keyed streams*, i.e. after a keyed/partitioned data
-exchange, and is restricted to the values associated with the current event's
-key. Aligning the keys of streams and state makes sure that all state updates
-are local operations, guaranteeing consistency without transaction overhead.
-This alignment also allows Flink to redistribute the state and adjust the
-stream partitioning transparently.
-
-{{< img src="/fig/state_partitioning.svg" alt="State and Partitioning" 
class="offset" width="50%" >}}
-
-Keyed State is further organized into so-called *Key Groups*. Key Groups are
-the atomic unit by which Flink can redistribute Keyed State; there are exactly
-as many Key Groups as the defined maximum parallelism.  During execution each
-parallel instance of a keyed operator works with the keys for one or more Key
-Groups.
-
-## State Persistence
-
-Flink implements fault tolerance using a combination of **stream replay** and
-**checkpointing**. A checkpoint marks a specific point in each of the
-input streams along with the corresponding state for each of the operators. A
-streaming dataflow can be resumed from a checkpoint while maintaining
-consistency *(exactly-once processing semantics)* by restoring the state of the
-operators and replaying the records from the point of the checkpoint.
-
-The checkpoint interval is a means of trading off the overhead of fault
-tolerance during execution with the recovery time (the number of records that
-need to be replayed).
-
-The fault tolerance mechanism continuously draws snapshots of the distributed
-streaming data flow. For streaming applications with small state, these
-snapshots are very light-weight and can be drawn frequently without much impact
-on performance.  The state of the streaming applications is stored at a
-configurable place, usually in a distributed file system.
-
-In case of a program failure (due to machine-, network-, or software failure),
-Flink stops the distributed streaming dataflow.  The system then restarts the
-operators and resets them to the latest successful checkpoint. The input
-streams are reset to the point of the state snapshot. Any records that are
-processed as part of the restarted parallel dataflow are guaranteed to not have
-affected the previously checkpointed state.
+## 键控状态（Keyed State）
+
+键控状态被维护在一个可以认为是键/值存储的地方。状态和有状态算子读取的流一起被严格地分区和分布。

Review comment:
       keyed state 感觉可以不翻译，我看其他翻译里面都保留这个关键字了。
   keyed 是对state 的一种定语，就是以key 存储，查找的一种state 类型




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] ranqiqiang commented on a change in pull request #16478: [FLINK-23228][docs-zh] Translate "Stateful Stream Processing" page into Chinese

Reply via email to