[ 
https://issues.apache.org/jira/browse/BEAM-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548687#comment-17548687
 ] 

Danny McCormick commented on BEAM-10019:
----------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/20310

> Keeping keys in a state for a very long time (keys expiry unknown)
> ------------------------------------------------------------------
>
>                 Key: BEAM-10019
>                 URL: https://issues.apache.org/jira/browse/BEAM-10019
>             Project: Beam
>          Issue Type: Improvement
>          Components: website
>            Reporter: MOHIL
>            Priority: P3
>              Labels: document, pipeline-patterns
>
> I have a use case which I think might be a good addition to the pipelines 
> patterns:
>  
> beam (java sdk) reads two kind of records from data stream like Kafka:
>  
> 1. Records of type A containing key and corresponding metadata. 
> 2. Records of type B containing the same key, but no metadata. Beam then 
> needs to fill metadata for records of type B  by doing a lookup for metadata 
> using keys received in records of type A. 
>  
> Idea is to save metadata or rather state for keys received in records of type 
> A and then do a lookup when records of type B are received.
>  Beam's "@State" construct  can be used here, however, problem is that we 
> don't know when keys should expire. I don't think keeping a global window 
> will be a good idea as there could be many keys (may be millions over a 
> period of time) to be saved in a state.
>  
> One possible solution as suggested by Reza Ardeshir Rokni ([email protected]):
>  
> We can maintain a state in a large fixed window (1 day or so), so that GC can 
> happen within a window bound. After window expire, save the metadata values 
> in an external DB like BigQuery. If we get a record with same key in a new 
> window looking for this metadata, fetch the metadata for that key from 
> external DB and save it in window's state again.
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to