Dear All,

I have questions regarding the keys. In general, the questions are:
what happens if I am doing keyBy based on unlimited number of keys? How Flink 
is managing each KeyedStream under the hood? Will I get memory overflow, for 
example, if every KeyStream associated with a specific key is taking certain 
amount of memory?
BTW, I think it is fare to say that, I have to clear my KeyedState so that the 
memory used by these State are cleaned up regularly. But still, I am wondering, 
even though I am regularly cleaning up State memory, what happened to memory 
used by the KeyedStream itself, if there is? And will they be exploding?

Let me give an example for understanding it clearly.  Let’s say we have a

        val requestStream: DataStream[HttpRequest]

which is a stream of HTTP requests. And by using the session ID as the key, we 
can obtain a KeyedStream per single session, as following:

        val streamPerSession: KeyedStream[HttpRequest] = 
requestStream.keyBy(_.sessionId)

However, the session IDs are actually a hashcode generated randomly by the Web 
service/application, so that means, the number of sessions are unlimited (which 
is reasonable, because every time a user open the application or login, he/she 
will get a new unique session). 

Then, the question is: will Flink eventually run out of memory because the 
number of sessions are unlimited (and because we are keying by the session ID)?
If so, how can we properly manage this situation?
If not, could you help me understand WHY?
Let’s also assume that, we are regularly clearing the KeyedState, so the memory 
used by the State will not explode. 


Many Thanks and Looking forward to your reply :)

Best regards/祝好,

Chang Liu 刘畅


Reply via email to