[ 
https://issues.apache.org/jira/browse/KAFKA-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089449#comment-15089449
 ] 

Michael Graff commented on KAFKA-3064:
--------------------------------------

Example log messages indicating this issue is occurring:

Jan  8 16:22:25 broker2 kafka [ReplicaFetcherThread-3-1] WARN 
[ReplicaFetcherThread-3-1], Replica 2 for partition [nom-dns-base-text,3] reset 
its fetch offset from 372718324 to current leader 1's start offset 372718324 
(kafka.server.ReplicaFetcherThread)
Jan  8 16:22:25 broker2 kafka [ReplicaFetcherThread-3-1] ERROR 
[ReplicaFetcherThread-3-1], Current offset 372712344 for partition 
[nom-dns-base-text,3] out of range; reset offset to 372718324 
(kafka.server.ReplicaFetcherThread)


> Improve resync method to waste less time and data transfer
> ----------------------------------------------------------
>
>                 Key: KAFKA-3064
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3064
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller, network
>            Reporter: Michael Graff
>            Assignee: Neha Narkhede
>
> We have several topics which are large (65 GB per partition) with 12 
> partitions.  Data rates into each topic vary, but in general each one has its 
> own rate.
> After a raid rebuild, we are pulling all the data over to the newly rebuild 
> raid.  This takes forever, and has yet to complete after nearly 8 hours.
> Here are my observations:
> (1)  The Kafka broker seems to pull from all topics on all partitions at the 
> same time, starting at the oldest message.
> (2)  When you divide total disk bandwidth available across all partitions 
> (really, only 48 of which have significant amounts of data, about 65 * 12 = 
> 780 GB each topic) the ingest rate of 36 out of 48 of them is higher than the 
> available bandwidth.
> (3)  The effect of (2) is that one topic SLOWLY catches up, while the other 4 
> topics continue to retrieve data at 75% of the bandwidth, just to toss it 
> away because the source broker has discarded it already.
> (4)  Eventually that one topic catches up, and the remaining bandwidth is 
> then divided into the remaining 36 partitions, one group of which starts to 
> catch up again.
> What I want to see is a way to say “don’t transfer more than X partitions at 
> the same time” and ideally a priority rule that says, “Transfer partitions 
> you are responsible for first, then transfer ones you are not.  Also, 
> transfer these first, then those, but no more than 1 topic at a time”
> What I REALLY want is for Kafka to track the new data (track the head of the 
> log) and then ask for the tail in chunks.  Ideally this would request from 
> the source, “what is the next logical older starting point?” and then start 
> there.  This way, the transfer basically becomes a file transfer of the log 
> stored on the source’s disk. Once that block is retrieved, it moves on to the 
> next oldest.  This way, there is almost zero waste as both the head and tail 
> grow, but the tail runs the risk of losing the final chunk only.  Thus, 
> bandwidth is not significantly wasted.
> All this changes the ISR check to be is “am I caught up on head AND tail?” 
> when the tail part is implied right now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to