Hello there,

We ran into a situation on our dev KAFKA cluster (3 nodes, v0.8.2) where 
we ran out of disk space on one of the nodes. To free up disk space, we 
reduced log.retention.hours to something more manageable (from 72hrs to 
52hrs) as well as we moved the log directory to disk of 200GB. We did 
this for all 3 nodes. 

Now, we are preparing for production and would like to get an 
understanding of how this works in KAFKA - as our data increases over 
time, we would like to mount more storage (in chunks of 200GB) at a time 
and have our topic’s storage expand into these mounted directories. 
KAFKA, by design supports adding more storage space as needed basis.. 
that is, for example, if we have a topic “myTopic” and we estimate 200GB 
is a reasonable storage in the beginning (so, mounted on a path 
/kafkastore1). Later, we realize that it is not sufficient and we need 
to add another chunk of storage (200GB) and we mount it on 
/kafkastore2.. .

How do I configure my data to expand into different directories as I add 
more and more space…

To be specific, say we configure log.dirs as the following:

Log.dirs = /kafkastore1,/kafkastore2 (comma separated)

So, when I create my topic “myTopic”, can I just create it with 
partitions = 1 (kafka-topics.sh –create –-topic myTopic –partitions 1

In this case, my questions:

(1) does the data that belongs to myTopic would automatically expand 
into /kafkastore2 once /kafkastore1 is completely full? 
(2) if not, do we have to create the topic with multiple partitions? If 
we have to create multiple partitions, how can we ensure the order of 
the messages for consumer (first published be consumed first)? For us, 
we need to consume the data in the same order as it is published.


Thanks,
avi lele

Reply via email to