What if we acquired the lock inside the "for (Destination d : map.values())" loop and immediately GCed that destination, rather than gathering them up to do at the end as the code you posted currently does? That allows the broker to service other requests in the meantime (especially if we Thread.yield() after releasing the lock at the end of the loop), but still ensure that a destination isn't allowed to become active between when we decide it's inactive and when we GC it. We'd probably need to make a defensive copy of the map to make sure that calling getRoot().removeDestination() doesn't cause a concurrent modification exception, but that's not expensive and I don't see anything else that requires the lock to be held from one destination to the next. Am I overlooking anything here?
All of that doesn't explicitly address the fact that we acquire the lock before we know there's work we have to do, but by releasing the lock between iterations the consequences of doing that should be negligible, so I think if we refactor when we hold the lock then we don't need to worry about it. Tim On Sat, Feb 21, 2015 at 9:54 PM, Kevin Burton <[email protected]> wrote: > Also, I realized there’s another bug here. A write lock is acquired even > in the situation where no queues actually need to be GCd. It’s not massive > in the grand scheme of things but it probably ends up with brief moments > where the broker isn’t serving any messages. > > On Sat, Feb 21, 2015 at 8:42 PM, Kevin Burton <[email protected]> wrote: > > > I’m still tracking down this bulk queue GC bug I’ve been dealing with. > > > > Right now I’m working on a test to duplicate the problem. > > > > However, in the mean time I noticed this: > > > > http://pastebin.com/PwPJQLNu > > > > … in regionBroker. > > > > Essentially, the code holds the write lock during the entire time of the > > GC for *all* the queues. > > > > In my experience, on disk, it takes 100ms to GC a queue (at least under > > load) > > > > It seems like a better strategy here would be to have the GC interval be > > very low, say 5 seconds, but then have a getMaxPurgedDestinationsPerSweep > > value low.. say 1-5 … > > > > The problem is that there is no way to change this value anywhere. There > > aren’t any configuration directives for this. From the source, it looks > > like it’s only used within testing. > > > > Perhaps this should become a config directive? > > > > Kevin > > > > -- > > > > Founder/CEO Spinn3r.com > > Location: *San Francisco, CA* > > blog: http://burtonator.wordpress.com > > … or check out my Google+ profile > > <https://plus.google.com/102718274791889610666/posts> > > <http://spinn3r.com> > > > > > > > -- > > Founder/CEO Spinn3r.com > Location: *San Francisco, CA* > blog: http://burtonator.wordpress.com > … or check out my Google+ profile > <https://plus.google.com/102718274791889610666/posts> > <http://spinn3r.com> >
