[jira] [Updated] (KAFKA-12370) Refactor KafkaStreams exposed metadata hierarchy

Guozhang Wang (Jira) Mon, 26 Sep 2022 13:51:05 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Guozhang Wang updated KAFKA-12370:
----------------------------------
    Description: 
Currently in KafkaStreams we have two groups of metadata getter:

1.
{code}
allMetadata
allMetadataForStore
{code}

Return collection of {{StreamsMetadata}}, which only contains the partitions as 
active/standby, plus the hostInfo, but not exposing any task info.

2.
{code}
queryMetadataForKey
{code}

Returns {{KeyQueryMetadata}} that includes the hostInfos of active and 
standbys, plus the partition id.

3.
{code}
localThreadsMetadata
{code}

Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} for 
active and standby tasks.

All the above functions are used for interactive queries, but their exposed 
metadata are very different, and some use cases would need to have all client, 
thread, and task metadata to fulfill the feature development. At the same time, 
we may have a more dynamic "task -> thread" mapping in the future and also the 
embedded clients like consumers would not be per thread, but per client.

---------------

Rethinking about the metadata, I feel we can have a more consistent hierarchy 
as the following:

* {{StreamsMetadata}} represent the metadata for the client, which includes the 
set of {{ThreadMetadata}} for its existing thread and the set of 
{{TaskMetadata}} for active and standby tasks assigned to this client, plus 
client metadata including hostInfo, embedded client ids.

* {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for 
currently assigned tasks. Also after we removed the deprecated EOSv1, it should 
always return a single producer client id since each thread would only have one 
client.

* {{TaskMetadata}} includes the name (including the sub-topology id and the 
partition id), the state, the corresponding sub-topology description (including 
the state store names, source topic names).

* {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed from 
queryMetadataForKey) returns the set of {{StreamsMetadata}}, and 
{{localMetadata}} (renamed from localThreadMetadata) returns a single 
{{StreamsMetadata}}.

* {{KeyQueryMetadata}} Class would be deprecated and replaced by 
{{TaskMetadata}}.

To illustrate as an example, to find out who are the current active host / 
standby hosts of a specific store, we would call {{allMetadataForStore}}, and 
for each returned {{StreamsMetadata}} we loop over their contained 
{{TaskMetadata}} for active / standby, and filter by its corresponding 
sub-topology's description's contained store name. 

  was:
Currently in KafkaStreams we have two groups of metadata getter:

1.
{code}
allMetadata
allMetadataForStore
{code}

Return collection of {{StreamsMetadata}}, which only contains the partitions as 
active/standby, plus the hostInfo, but not exposing any task info.

2.
{code}
queryMetadataForKey
{code}

Returns {{KeyQueryMetadata}} that includes the hostInfos of active and 
standbys, plus the partition id.

3.
{code}
localThreadsMetadata
{code}

Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} for 
active and standby tasks.

All the above functions are used for interactive queries, but their exposed 
metadata are very different, and some use cases would need to have all client, 
thread, and task metadata to fulfill the feature development. At the same time, 
we may have a more dynamic "task -> thread" mapping in the future and also the 
embedded clients like consumers would not be per thread, but per client.

---------------

Rethinking about the metadata, I feel we can have a more consistent hierarchy 
as the following:

* {{StreamsMetadata}} represent the metadata for the client, which includes the 
set of {{ThreadMetadata}} for its existing thread and the set of 
{{TaskMetadata}} for active and standby tasks assigned to this client, plus 
client metadata including hostInfo, embedded client ids.

* {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for 
currently assigned tasks.

* {{TaskMetadata}} includes the name (including the sub-topology id and the 
partition id), the state, the corresponding sub-topology description (including 
the state store names, source topic names).

* {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed from 
queryMetadataForKey) returns the set of {{StreamsMetadata}}, and 
{{localMetadata}} (renamed from localThreadMetadata) returns a single 
{{StreamsMetadata}}.

* {{KeyQueryMetadata}} Class would be deprecated and replaced by 
{{TaskMetadata}}.

To illustrate as an example, to find out who are the current active host / 
standby hosts of a specific store, we would call {{allMetadataForStore}}, and 
for each returned {{StreamsMetadata}} we loop over their contained 
{{TaskMetadata}} for active / standby, and filter by its corresponding 
sub-topology's description's contained store name. 


> Refactor KafkaStreams exposed metadata hierarchy
> ------------------------------------------------
>
>                 Key: KAFKA-12370
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12370
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Josep Prat
>            Priority: Major
>              Labels: needs-kip, new-streams-runtime-should-fix
>
> Currently in KafkaStreams we have two groups of metadata getter:
> 1.
> {code}
> allMetadata
> allMetadataForStore
> {code}
> Return collection of {{StreamsMetadata}}, which only contains the partitions 
> as active/standby, plus the hostInfo, but not exposing any task info.
> 2.
> {code}
> queryMetadataForKey
> {code}
> Returns {{KeyQueryMetadata}} that includes the hostInfos of active and 
> standbys, plus the partition id.
> 3.
> {code}
> localThreadsMetadata
> {code}
> Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} 
> for active and standby tasks.
> All the above functions are used for interactive queries, but their exposed 
> metadata are very different, and some use cases would need to have all 
> client, thread, and task metadata to fulfill the feature development. At the 
> same time, we may have a more dynamic "task -> thread" mapping in the future 
> and also the embedded clients like consumers would not be per thread, but per 
> client.
> ---------------
> Rethinking about the metadata, I feel we can have a more consistent hierarchy 
> as the following:
> * {{StreamsMetadata}} represent the metadata for the client, which includes 
> the set of {{ThreadMetadata}} for its existing thread and the set of 
> {{TaskMetadata}} for active and standby tasks assigned to this client, plus 
> client metadata including hostInfo, embedded client ids.
> * {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for 
> currently assigned tasks. Also after we removed the deprecated EOSv1, it 
> should always return a single producer client id since each thread would only 
> have one client.
> * {{TaskMetadata}} includes the name (including the sub-topology id and the 
> partition id), the state, the corresponding sub-topology description 
> (including the state store names, source topic names).
> * {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed 
> from queryMetadataForKey) returns the set of {{StreamsMetadata}}, and 
> {{localMetadata}} (renamed from localThreadMetadata) returns a single 
> {{StreamsMetadata}}.
> * {{KeyQueryMetadata}} Class would be deprecated and replaced by 
> {{TaskMetadata}}.
> To illustrate as an example, to find out who are the current active host / 
> standby hosts of a specific store, we would call {{allMetadataForStore}}, and 
> for each returned {{StreamsMetadata}} we loop over their contained 
> {{TaskMetadata}} for active / standby, and filter by its corresponding 
> sub-topology's description's contained store name. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-12370) Refactor KafkaStreams exposed metadata hierarchy

Reply via email to