[
https://issues.apache.org/jira/browse/KAFKA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang updated KAFKA-12370:
----------------------------------
Description:
Currently in KafkaStreams we have two groups of metadata getter:
1.
{code}
allMetadata
allMetadataForStore
{code}
Return collection of {{StreamsMetadata}}, which only contains the partitions as
active/standby, plus the hostInfo, but not exposing any task info.
2.
{code}
queryMetadataForKey
{code}
Returns {{KeyQueryMetadata}} that includes the hostInfos of active and
standbys, plus the partition id.
3.
{code}
localThreadsMetadata
{code}
Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} for
active and standby tasks.
All the above functions are used for interactive queries, but their exposed
metadata are very different, and some use cases would need to have all client,
thread, and task metadata to fulfill the feature development. At the same time,
we may have a more dynamic "task -> thread" mapping in the future and also the
embedded clients like consumers would not be per thread, but per client.
---------------
Rethinking about the metadata, I feel we can have a more consistent hierarchy
as the following:
* {{StreamsMetadata}} represent the metadata for the client, which includes the
set of {{ThreadMetadata}} for its existing thread and the set of
{{TaskMetadata}} for active and standby tasks assigned to this client, plus
client metadata including hostInfo, embedded client ids.
* {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for
currently assigned tasks. Also after we removed the deprecated EOSv1, it should
always return a single producer client id since each thread would only have one
client.
* {{TaskMetadata}} includes the name (including the sub-topology id and the
partition id), the state, the corresponding sub-topology description (including
the state store names, source topic names).
* {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed from
queryMetadataForKey) returns the set of {{StreamsMetadata}}, and
{{localMetadata}} (renamed from localThreadMetadata) returns a single
{{StreamsMetadata}}.
* {{KeyQueryMetadata}} Class would be deprecated and replaced by
{{TaskMetadata}}.
To illustrate as an example, to find out who are the current active host /
standby hosts of a specific store, we would call {{allMetadataForStore}}, and
for each returned {{StreamsMetadata}} we loop over their contained
{{TaskMetadata}} for active / standby, and filter by its corresponding
sub-topology's description's contained store name.
was:
Currently in KafkaStreams we have two groups of metadata getter:
1.
{code}
allMetadata
allMetadataForStore
{code}
Return collection of {{StreamsMetadata}}, which only contains the partitions as
active/standby, plus the hostInfo, but not exposing any task info.
2.
{code}
queryMetadataForKey
{code}
Returns {{KeyQueryMetadata}} that includes the hostInfos of active and
standbys, plus the partition id.
3.
{code}
localThreadsMetadata
{code}
Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} for
active and standby tasks.
All the above functions are used for interactive queries, but their exposed
metadata are very different, and some use cases would need to have all client,
thread, and task metadata to fulfill the feature development. At the same time,
we may have a more dynamic "task -> thread" mapping in the future and also the
embedded clients like consumers would not be per thread, but per client.
---------------
Rethinking about the metadata, I feel we can have a more consistent hierarchy
as the following:
* {{StreamsMetadata}} represent the metadata for the client, which includes the
set of {{ThreadMetadata}} for its existing thread and the set of
{{TaskMetadata}} for active and standby tasks assigned to this client, plus
client metadata including hostInfo, embedded client ids.
* {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for
currently assigned tasks.
* {{TaskMetadata}} includes the name (including the sub-topology id and the
partition id), the state, the corresponding sub-topology description (including
the state store names, source topic names).
* {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed from
queryMetadataForKey) returns the set of {{StreamsMetadata}}, and
{{localMetadata}} (renamed from localThreadMetadata) returns a single
{{StreamsMetadata}}.
* {{KeyQueryMetadata}} Class would be deprecated and replaced by
{{TaskMetadata}}.
To illustrate as an example, to find out who are the current active host /
standby hosts of a specific store, we would call {{allMetadataForStore}}, and
for each returned {{StreamsMetadata}} we loop over their contained
{{TaskMetadata}} for active / standby, and filter by its corresponding
sub-topology's description's contained store name.
> Refactor KafkaStreams exposed metadata hierarchy
> ------------------------------------------------
>
> Key: KAFKA-12370
> URL: https://issues.apache.org/jira/browse/KAFKA-12370
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Guozhang Wang
> Assignee: Josep Prat
> Priority: Major
> Labels: needs-kip, new-streams-runtime-should-fix
>
> Currently in KafkaStreams we have two groups of metadata getter:
> 1.
> {code}
> allMetadata
> allMetadataForStore
> {code}
> Return collection of {{StreamsMetadata}}, which only contains the partitions
> as active/standby, plus the hostInfo, but not exposing any task info.
> 2.
> {code}
> queryMetadataForKey
> {code}
> Returns {{KeyQueryMetadata}} that includes the hostInfos of active and
> standbys, plus the partition id.
> 3.
> {code}
> localThreadsMetadata
> {code}
> Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}}
> for active and standby tasks.
> All the above functions are used for interactive queries, but their exposed
> metadata are very different, and some use cases would need to have all
> client, thread, and task metadata to fulfill the feature development. At the
> same time, we may have a more dynamic "task -> thread" mapping in the future
> and also the embedded clients like consumers would not be per thread, but per
> client.
> ---------------
> Rethinking about the metadata, I feel we can have a more consistent hierarchy
> as the following:
> * {{StreamsMetadata}} represent the metadata for the client, which includes
> the set of {{ThreadMetadata}} for its existing thread and the set of
> {{TaskMetadata}} for active and standby tasks assigned to this client, plus
> client metadata including hostInfo, embedded client ids.
> * {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for
> currently assigned tasks. Also after we removed the deprecated EOSv1, it
> should always return a single producer client id since each thread would only
> have one client.
> * {{TaskMetadata}} includes the name (including the sub-topology id and the
> partition id), the state, the corresponding sub-topology description
> (including the state store names, source topic names).
> * {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed
> from queryMetadataForKey) returns the set of {{StreamsMetadata}}, and
> {{localMetadata}} (renamed from localThreadMetadata) returns a single
> {{StreamsMetadata}}.
> * {{KeyQueryMetadata}} Class would be deprecated and replaced by
> {{TaskMetadata}}.
> To illustrate as an example, to find out who are the current active host /
> standby hosts of a specific store, we would call {{allMetadataForStore}}, and
> for each returned {{StreamsMetadata}} we loop over their contained
> {{TaskMetadata}} for active / standby, and filter by its corresponding
> sub-topology's description's contained store name.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)