Henry Cai created KAFKA-19225:
---------------------------------

             Summary: Tiered Storage Support for Active Log Segment
                 Key: KAFKA-19225
                 URL: https://issues.apache.org/jira/browse/KAFKA-19225
             Project: Kafka
          Issue Type: New Feature
          Components: Tiered-Storage
    Affects Versions: 4.0.0
            Reporter: Henry Cai
            Assignee: Henry Cai
             Fix For: 4.0.1


This is the Jira for [KIP-1176: Tiered Storage Support for Active Log 
Segment|[https://cwiki.apache.org/confluence/display/KAFKA/KIP-1176%3A+Tiered+Storage+for+Active+Log+Segment]]

In KIP-405, the community has proposed and implemented the tiered storage for 
old Kafka log segment files, when the log segments is older than 
{_}local.retention.ms{_}, it becomes eligible to be uploaded to cloud's object 
storage and removed from the local storage thus reducing local storage cost.  
KIP-405 only uploads older log segments but not the most recent active log 
segments (write-ahead logs). Thus in a typical 3-way replicated Kafka cluster, 
the 2 follower brokers would still need to replicate the active log segments 
from the leader broker. It is common practice to set up the 3 brokers in three 
different AZs to improve the high availability of the cluster. This would cause 
the replications between leader/follower brokers to be across AZs which is a 
significant cost ([various 
studies|https://www.confluent.io/blog/understanding-and-optimizing-your-kafka-costs-part-1-infrastructure/]
 show the across AZ transfer cost typically comprises 50%-60% of the total 
cluster cost). Since all the active log segments are physically present on 
three Kafka Brokers, they still comprise significant resource usage on the 
brokers. The state of the broker is still quite big during node replacement, 
leading to longer node replacement time. 
[KIP-1150|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics]
 recently proposes diskless Kafka topic, but leads to increased latency and a 
significant redesign. In comparison, this proposed KIP maintains identical 
performance for acks=1 producer path, minimizes design changes to Kafka, and 
still slashes cost by an estimated 43%.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to