HI Jay, Thanks for response. I feel this needs to be documented as limitation of New Java Producer Batch size vs buffer size impact when increasing the partition. I agree what you get fine grain control (which is great), but ultimately loosing the functionality of increasing partition for scalability which I think is greater without impacting running live production environment producers. I would argue from customer prospective that I want to have flag called *auto.update.batch.size* respect to buffer size for new producer which will recalculate batch size when partition increase is detected. So the running code does not throw this exception or block application threads for more memory which it will never get. (this is just my suggestion)
Do you agree ? Should I file a Jira for this ? I am sure others will run into this problem for sure sooner or later. Thanks, Bhavesh On Wed, Nov 5, 2014 at 4:44 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > Bhavesh, > > Wouldn't using the default batch size of 16k have avoided this problem > entirely? I think the best solution now is just to change the > configuration. What I am saying is it is unlikely you will need to do this > again, the problem is just that 1MB partition batches are quite large so > you quickly run out of memory very quickly with that configuration. > > I agree that the scala producer doesn't have this problem, but it actually > doesn't really let you control the memory use or the request size very > effectively which I would argue is a much bigger problem. Once you > introduce those controls you have to configure how to make use of them, > which is what this is about. > > -Jay > > On Wed, Nov 5, 2014 at 3:45 PM, Bhavesh Mistry <mistry.p.bhav...@gmail.com > > > wrote: > > > Hi Jay or Kafka Dev Team, > > > > Any suggestions, how I can deal with this situation of expanding > > partitions for New Java Producer for scalability (consumer side) ? > > > > Thanks, > > > > Bhavesh > > > > On Tue, Nov 4, 2014 at 7:08 PM, Bhavesh Mistry < > mistry.p.bhav...@gmail.com > > > > > wrote: > > > > > Also, to added to this Old producer (Scala based in not impacted by the > > > partition changes). So it is important scalability feature being taken > > way > > > if you do not plan for expansion from the beginning for New Java > > Producer. > > > > > > So, New Java Producer is taking way this critical feature (unless > plan). > > > > > > Thanks, > > > > > > Bhavesh > > > > > > On Tue, Nov 4, 2014 at 4:56 PM, Bhavesh Mistry < > > mistry.p.bhav...@gmail.com > > > > wrote: > > > > > >> HI Jay, > > >> > > >> Fundamental, problem is batch size is already configured and producers > > >> are running in production with given configuration. ( Previous value > > were > > >> just sample). How do we increase partitions for topics when batch > size > > >> exceed and configured buffer limit ? Yes, had we planed for batch > size > > >> smaller we can do this, but we cannot do this if producers are already > > >> running. Have you faced this problem at LinkedIn or any other place ? > > >> > > >> > > >> Thanks, > > >> > > >> Bhavesh > > >> > > >> On Tue, Nov 4, 2014 at 4:25 PM, Jay Kreps <jay.kr...@gmail.com> > wrote: > > >> > > >>> Hey Bhavesh, > > >>> > > >>> No there isn't such a setting. But what I am saying is that I don't > > think > > >>> you really need that feature. I think instead you can use a 32k batch > > >>> size > > >>> with your 64M memory limit. This should mean you can have up up to > 2048 > > >>> batches in flight. Assuming one batch in flight and one being added > to > > at > > >>> any given time, then this should work well for up to ~1000 > partitions. > > So > > >>> rather than trying to do anything dynamic. So assuming each producer > > >>> sends > > >>> to just one topic then you would be fine as long as that topic had > > fewer > > >>> than 1000 partitions. If you wanted to add more you would need to add > > >>> memory on producers. > > >>> > > >>> -Jay > > >>> > > >>> On Tue, Nov 4, 2014 at 4:04 PM, Bhavesh Mistry < > > >>> mistry.p.bhav...@gmail.com> > > >>> wrote: > > >>> > > >>> > Hi Jay, > > >>> > > > >>> > I agree and understood what you have mentioned in previous email. > > But > > >>> when > > >>> > you have 5000+ producers running in cloud ( I am sure linkedin has > > many > > >>> > more and need to increase partitions for scalability) then all > > running > > >>> > producer will not send any data. So Is there any feature or setting > > >>> that > > >>> > make sense to shrink batch size to fit the increase. I am sure > other > > >>> will > > >>> > face the same issue. Had I configured with > block.on.buffer.full=true > > >>> it > > >>> > will be even worse and will block application threads. Our use > case > > is > > >>> > *logger.log(msg)* method can not be blocked so that is why we have > > >>> > configuration to false. > > >>> > > > >>> > So I am sure others will run into this same issues. Try to find > the > > >>> > optimal solution and recommendation from Kafka Dev team for this > > >>> particular > > >>> > use case (which may become common). > > >>> > > > >>> > Thanks, > > >>> > > > >>> > Bhavesh > > >>> > > > >>> > On Tue, Nov 4, 2014 at 3:12 PM, Jay Kreps <jay.kr...@gmail.com> > > wrote: > > >>> > > > >>> > > Hey Bhavesh, > > >>> > > > > >>> > > Here is what your configuration means > > >>> > > buffer.memory=64MB # This means don't use more than 64MB of > memory > > >>> > > batch.size=1MB # This means allocate a 1MB buffer for each > > partition > > >>> with > > >>> > > data > > >>> > > block.on.buffer.full=false # This means immediately throw an > > >>> exception if > > >>> > > there is not enough memory to create a new buffer > > >>> > > > > >>> > > Not sure what linger time you have set. > > >>> > > > > >>> > > So what you see makes sense. If you have 1MB buffers and 32 > > >>> partitions > > >>> > then > > >>> > > you will have approximately 32MB of memory in use (actually a bit > > >>> more > > >>> > than > > >>> > > this since one buffer will be filling while another is sending). > If > > >>> you > > >>> > > have 128 partitions then you will try to use 128MB, and since you > > >>> have > > >>> > > configured the producer to fail when you reach 64 (rather than > > >>> waiting > > >>> > for > > >>> > > memory to become available) that is what happens. > > >>> > > > > >>> > > I suspect if you want a smaller batch size. More than 64k is > > usually > > >>> not > > >>> > > going to help throughput. > > >>> > > > > >>> > > -Jay > > >>> > > > > >>> > > On Tue, Nov 4, 2014 at 11:39 AM, Bhavesh Mistry < > > >>> > > mistry.p.bhav...@gmail.com> > > >>> > > wrote: > > >>> > > > > >>> > > > Hi Kafka Dev, > > >>> > > > > > >>> > > > With new Producer, we are having to change the # partitions > for a > > >>> > topic, > > >>> > > > and we face this issue BufferExhaustedException. > > >>> > > > > > >>> > > > Here is example, we have set 64MiB and 32 partitions and 1MiB > > of > > >>> > batch > > >>> > > > size. But when we increase the partition to 128, it throws > > >>> > > > BufferExhaustedException right way (non key based message). > > >>> Buffer is > > >>> > > > allocated based on batch.size. This is very common need to set > > >>> auto > > >>> > > > calculate batch size when partitions increase because we have > > about > > >>> > ~5000 > > >>> > > > boxes and it is not practical to deploy code in all machines > than > > >>> > expand > > >>> > > > partition for scalability purpose. What are options > available > > >>> while > > >>> > > new > > >>> > > > producer is running and partition needs to increase and not > > enough > > >>> > buffer > > >>> > > > to allocate batch size for additional partition ? > > >>> > > > > > >>> > > > buffer.memory=64MiB > > >>> > > > batch.size=1MiB > > >>> > > > block.on.buffer.full=false > > >>> > > > > > >>> > > > > > >>> > > > Thanks, > > >>> > > > > > >>> > > > Bhavesh > > >>> > > > > > >>> > > > > >>> > > > >>> > > >> > > >> > > > > > >