[ 
https://issues.apache.org/jira/browse/KAFKA-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyao Hu updated KAFKA-1404:
-----------------------------

    Description: 
This is somewhat related to KAFKA-1403. 

One way to hack KAFKA-1403 is to roll a new file in a short period of time. 
However, this will result in many file descriptors open. Take our application 
for example, each server hosts about 5k topic-partition, if we roll a new file 
per hour, we will add ~100k file descriptors per day (I checked only .log is 
open but not .index which might be pinned in memory). We will run out of 1M 
file descriptor in about a week. However our disk can host much longer.  

In reality very few of these file descriptors will be used. The most recent fd 
will be used to append data and the old file descriptor will be used for query. 
We should provide a parameter like max.num.fds and do LRU to decide which fds 
should be open. 

  was:
This is some what related to KAFKA-1403. 

One way to hack KAFKA-1403 is to roll a new file in a short period of time. 
However, this will result in many file descriptor open. Take our application 
for example, each server hosts about 5k topic-partition, if we roll a new file 
per hour, we will have over 100k file descriptors open (I checked only .log is 
open but not .index which might be pinned in memory). 

In most of the application, very few of these file descriptors will be read. We 
should provide a parameter like max.num.fds and do LRU inside kafka to decide 
which fds should be open. 


> Close unused log file
> ---------------------
>
>                 Key: KAFKA-1404
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1404
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 0.8.1
>            Reporter: Xinyao Hu
>            Priority: Critical
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This is somewhat related to KAFKA-1403. 
> One way to hack KAFKA-1403 is to roll a new file in a short period of time. 
> However, this will result in many file descriptors open. Take our application 
> for example, each server hosts about 5k topic-partition, if we roll a new 
> file per hour, we will add ~100k file descriptors per day (I checked only 
> .log is open but not .index which might be pinned in memory). We will run out 
> of 1M file descriptor in about a week. However our disk can host much longer. 
>  
> In reality very few of these file descriptors will be used. The most recent 
> fd will be used to append data and the old file descriptor will be used for 
> query. We should provide a parameter like max.num.fds and do LRU to decide 
> which fds should be open. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to