Hi Luke, After moving all data from one disk to another on every Kafka broker, I went through some weird behaviour I wanted to highlight here.
Basically the disk storage kept increasing even though there is no change on `bytes in` metric per broker. After investigation, I’ve seen that all segment log files in the new log.dir had a modification time set to the moment when the copy had been done. So I guess the process applying the retention policy (log cleaner?) uses that timestamp to check whether the segment file should be deleted or not. So I ended with a lot more data than we supposed to store, since we are basically doubling the retention time of all the freshly moved data. It seems a bit off that the Kafka-reassign-partition command doesn’t handle that somehow when moving the data. Do you please think of a way to avoid the issue? Do you confirm that the log cleaner uses the modification date of the segment log files in the FS to check whether to delete the data? Thanks a lot Fares Le mer. 13 avr. 2022 à 15:01, Luke Chen <show...@gmail.com> a écrit : > Hi Fares, > > Thanks for sharing the information. > > > I’m considering throttling « manually » by moving a small set of topics > at > a time and separating large topics. > > Yes, this is also the option I can think of. > Not sure if there are other suggestions from the community. > > Thank you. > Luke > > On Wed, Apr 13, 2022 at 4:09 PM Fares Oueslati <oueslati.fa...@gmail.com> > wrote: > > > To be more specific, here is the detail of the producer latency after and > > during the move on the test cluster > > > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 1.8 ms avg latency, 3.0 > ms > > max latency. > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 2.1 ms avg latency, 8.0 > ms > > max latency. > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 1.8 ms avg latency, 3.0 > ms > > max latency. > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 2.1 ms avg latency, 10.0 > > ms max latency. > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 1.7 ms avg latency, 2.0 > ms > > max latency. > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 278.9 ms avg latency, > > 1756.0 ms max latency. > > 51 records sent, 10.2 records/sec (0.01 MB/sec), 5.3 ms avg latency, 28.0 > > ms max latency. > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 8.0 ms avg latency, 81.0 > > ms max latency. > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 6.6 ms avg latency, 56.0 > > ms max latency. > > 50 records sent, 10.0 records/sec (0.01 MB/sec), 8.2 ms avg latency, 90.0 > > ms max latency. > > 51 records sent, 10.0 records/sec (0.01 MB/sec), 8.4 ms avg latency, 78.0 > > ms max latency. > > 51 records sent, 10.0 records/sec (0.01 MB/sec), 5.9 ms avg latency, 30.0 > > ms max latency. > > 50 records sent, 9.9 records/sec (0.01 MB/sec), 5.5 ms avg latency, 33.0 > ms > > max latency. > > 51 records sent, 10.0 records/sec (0.01 MB/sec), 12.2 ms avg latency, > 263.0 > > ms max latency. > > 47 records sent, 9.2 records/sec (0.01 MB/sec), 169.4 ms avg latency, > > 1173.0 ms max latency. > > 54 records sent, 10.7 records/sec (0.01 MB/sec), 82.3 ms avg latency, > 739.0 > > ms max latency. > > 51 records sent, 10.0 records/sec (0.01 MB/sec), 4.7 ms avg latency, 52.0 > > ms max latency. > > > > Le mer. 13 avr. 2022 à 08:54, Fares Oueslati <oueslati.fa...@gmail.com> > a > > écrit : > > > > > Hi Luke, > > > > > > For now I’m validating the operation on a 3 brokers test cluster with > > > ~50Gb of data on a single topic (generated using > > > kafka-producer-perf-test.sh) with one test producer that is reporting a > > > spike of latency during the operation, going from 2ms on average to 2 > > > seconds. I’m using ssd disks on gcp fyi. > > > > > > The real clusters where I need to move all the partitions have got ~2Tb > > of > > > data with several active clients. > > > > > > I’m considering throttling « manually » by moving a small set of > topics > > > at a time and separating large topics. > > > > > > Wdyt! > > > > > > Thank you > > > > > > Le mer. 13 avr. 2022 à 04:56, Luke Chen <show...@gmail.com> a écrit : > > > > > >> Hi Fares, > > >> > > >> > Are you aware of any way to throttle the movement of data between > > disks? > > >> > > >> Interesting question! We've never considered the throttle in disk IO. > > >> Does it impact the normal throughput a lot? > > >> > > >> Thank you. > > >> Luke > > >> > > >> On Wed, Apr 13, 2022 at 1:50 AM Fares Oueslati < > > oueslati.fa...@gmail.com> > > >> wrote: > > >> > > >> > Thanks Luke for your answer. > > >> > > > >> > Are you aware of any way to throttle the movement of data between > > disks? > > >> > The —throttle option of the ˋKafka-reassign-partitions` allows to > > >> throttle > > >> > inter broker throughput only. > > >> > > > >> > > > >> > Le jeu. 7 avr. 2022 à 05:11, Luke Chen <show...@gmail.com> a écrit > : > > >> > > > >> > > Hi Fares, > > >> > > > > >> > > I don't know if there is other simpler solution, but I think the > > >> > > `kafka-reassign-partitions` command is the safest way. > > >> > > > > >> > > > > >> > > Thank you. > > >> > > Luke > > >> > > > > >> > > On Wed, Apr 6, 2022 at 11:32 PM Fares Oueslati < > > >> oueslati.fa...@gmail.com > > >> > > > > >> > > wrote: > > >> > > > > >> > > > Hey 👋 > > >> > > > I am using a jbod setup in a 2.8 Kafka cluster. > > >> > > > > > >> > > > I started with only one disk in my JBOD, all partitions are on > one > > >> > volume > > >> > > > (one log.dir) > > >> > > > > > >> > > > I have added a disk with the right log.dir and the brokers are > > well > > >> > > > configured, I would like to move all replicas of all partitions > > >> without > > >> > > > exception from the first volume to the new one. > > >> > > > > > >> > > > With the `kafka-reassign-partitions` command it seems to be a > bit > > >> too > > >> > > much > > >> > > > trouble. > > >> > > > I need to generate a `Proposed partition reassignment > > configuration` > > >> > and > > >> > > > then modify the paths to the log_dirs dynamically according to > > what > > >> is > > >> > in > > >> > > > the `replicas` list. > > >> > > > > > >> > > > It can be automated but I wonder if there is a simpler solution > > for > > >> my > > >> > > > relatively simple need. > > >> > > > > > >> > > > Thanks > > >> > > > > > >> > > > > >> > > > >> > > > > > >