[ 
https://issues.apache.org/jira/browse/FLINK-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530933#comment-16530933
 ] 

buptljy commented on FLINK-9178:
--------------------------------

[~tzulitai] I'd like to add something about this issue, whick is very similar 
with the problem that I've met recently.
The program is developed for receiving realtime data and count distinct ip 
within a 10-minutes window, and sink the aggregated data into hbase.(The window 
is based on event time.) Now something goes wrong and we want to re-consume all 
data from kafka's earliest offset, but it can't work very well because there 
will be too many event-time-windows in the memory.
I think it'll be okay if we use ProcessingTime instead, because there will be 
only a single window even though you consume from the earliest offset. So I 
wonder if we can add a parameter to control the rate of receiving data, like a 
upper bound of consuming rate ?

> Add rate control for kafka source
> ---------------------------------
>
>                 Key: FLINK-9178
>                 URL: https://issues.apache.org/jira/browse/FLINK-9178
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kafka Connector
>            Reporter: buptljy
>            Assignee: Tarush Grover
>            Priority: Major
>
> When I want to run the flink program from the earliest offset in Kafka, it'll 
> be very easy to cause OOM if there are too much data, because of too many 
> HeapMemorySegment in NetworkBufferPool.
> Maybe we should have some settings to control the rate of the receiving data?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to