The broker learns from historicals and tasks even though recently a PR has been merged to keep published segments in memory ( https://github.com/apache/incubator-druid/pull/6901) in brokers. Probably it makes sense to filter out segments in brokers too if they are from historicals and not in the metadata store.
Jihoon On Fri, Mar 1, 2019 at 1:24 PM David Glasser <glas...@apollographql.com> wrote: > That makes sense. Does the coordinator's decisions about what segments are > 'used' affect the broker's choices for routing queries, or does it just > learn about things directly from historicals/ingestion tasks (via... > zookeeper?) > > --dave > > On Fri, Mar 1, 2019 at 1:15 PM Jihoon Son <ghoon...@gmail.com> wrote: > > > Hi Dave, > > > > I think the third option sounds most reasonable to fix this issue. Though > > the second option sounds useful in general. > > And yes, it wouldn't be easy to refuse to announce unknown segments in > > historicals. > > I think it makes more sense to check only in the coordinator because it's > > the only node who would directly access to the metadata store (except > > overlord). > > So, the coordinator may not update the "used" flag if overshadowing > > segments are not in the metadata store. > > In stream ingestion, segments might not be in the metadata store until > they > > are published. However, this shouldn't be a problem because segments are > > always appended in stream ingestion. > > > > Jihoon > > > > On Fri, Mar 1, 2019 at 12:49 AM David Glasser <glas...@apollographql.com > > > > wrote: > > > > > (I sent this message to druid-user last week and got no response. Since > > it > > > is proposing making improvements to Druid, I thought maybe it would be > > > appropriate to resend here. Hope that's OK.) > > > > > > We had a big outage in our Druid cluster last week. We run our Druid > > > servers in Kubernetes, and our historicals use machine local SSDs for > > their > > > segment caches. We made the unfortunate choice to have our production > > and > > > staging historicals share the same pool of machines, and today got bit > by > > > this for the first time. > > > > > > A production historical started up on a machine whose segment cache > > > contained segments from our staging cluster. Our prod and staging > > clusters > > > use the same names for data sources. > > > > > > This meant that these segments overshadowed production segments which > > > happened to have lower versions. Worse, when > > > DruidCoordinatorCleanupOvershadowed kicked in, all of the production > > > segments that were overshadowed got used=false set, and quickly got > > dropped > > > from historicals. This ended up being the majority of our data. We > > > eventually figured out what was going on and did a bunch of manual > steps > > to > > > clean up (turning off and clearing the cache of the two historicals > that > > > had staging segments on them, manually setting used=true for all > entries > > in > > > druid_segments, waiting a long long time for data to re-download), but > > > figuring out what was going on was subtle (I was very lucky I had > > randomly > > > decided to read a lot of the code about how the `used` column works and > > how > > > versioned timelines are calculated just a few days before!). > > > > > > (We were also lucky that we had turned off coordinator automatic > killing > > > literally that morning!) > > > > > > I feel like Druid should have been able to protect me from this to some > > > degree. (Yes, we are going to address the root cause by making it > > > impossible for prod and staging to reuse each others' disks.) Some > > thoughts > > > on changes that could have helped: > > > > > > - Is the Druid standard to prepend the "cluster" name to the data > source > > > name, so that conflicts like this are never possible? We are certainly > > > tempted to do this now but nobody ever told us to. If that's the > > standard, > > > should it be documented? > > > > > > - Should clusters have an optional name/namespace, and DataSegments > have > > > that namespace recorded in it, and clusters refuse to handle segments > > they > > > find that are from a different namespace? This would be like the common > > > database setup where a single server/cluster has a set of database > which > > > each have a set of tables. > > > > > > - Should historicals refuse to announce segments that don't exist in > the > > > druid_segments table, or should coordinators/brokers/etc refuse to pay > > > attention to segments announced *by historicals* that don't exist in > the > > > druid_segments table. I'm going to guess this is difficult to do in > the > > > historical because the historical probably doesn't actually talk to the > > sql > > > DB at all? But maybe it could be done by coordinator and broker? > > > > > > --dave > > > > > >