Makes sense.

To elaborate a bit more on my "cluster name" concept, I actually think it
would be pretty straightforward:

- Add something like `druid.cluster.name=staging`.
- To be compatible with existing data, also add something like
`druid.cluster.allowSegmentsFromClusters=["", "dev"]`. Note that the empty
string is explicitly recognized here.
- Add a `clusterName` field to DataSegment. When creating a new segment,
set its clusterName field to the value of druid.cluster.name.
- Make various places that see DataSegments ignore and warn when presented
with segments whose cluster does not match druid.cluster.name or a value in
druid.cluster.allowSegmentsFromClusters. This would include
SegmentLoadDropHandler (which is what looks at the local cache in
historicals etc), operations that publish new segments, etc.

This might actually be simpler and more efficient than going to the
database each time, though the database approach could handle other related
issues I suppose.

On Fri, Mar 1, 2019 at 1:58 PM Jihoon Son <ghoon...@gmail.com> wrote:

> The broker learns from historicals and tasks even though recently a PR has
> been merged to keep published segments in memory (
> https://github.com/apache/incubator-druid/pull/6901) in brokers.
> Probably it makes sense to filter out segments in brokers too if they are
> from historicals and not in the metadata store.
>
> Jihoon
>
> On Fri, Mar 1, 2019 at 1:24 PM David Glasser <glas...@apollographql.com>
> wrote:
>
> > That makes sense. Does the coordinator's decisions about what segments
> are
> > 'used' affect the broker's choices for routing queries, or does it just
> > learn about things directly from historicals/ingestion tasks (via...
> > zookeeper?)
> >
> > --dave
> >
> > On Fri, Mar 1, 2019 at 1:15 PM Jihoon Son <ghoon...@gmail.com> wrote:
> >
> > > Hi Dave,
> > >
> > > I think the third option sounds most reasonable to fix this issue.
> Though
> > > the second option sounds useful in general.
> > > And yes, it wouldn't be easy to refuse to announce unknown segments in
> > > historicals.
> > > I think it makes more sense to check only in the coordinator because
> it's
> > > the only node who would directly access to the metadata store (except
> > > overlord).
> > > So, the coordinator may not update the "used" flag if overshadowing
> > > segments are not in the metadata store.
> > > In stream ingestion, segments might not be in the metadata store until
> > they
> > > are published. However, this shouldn't be a problem because segments
> are
> > > always appended in stream ingestion.
> > >
> > > Jihoon
> > >
> > > On Fri, Mar 1, 2019 at 12:49 AM David Glasser <
> glas...@apollographql.com
> > >
> > > wrote:
> > >
> > > > (I sent this message to druid-user last week and got no response.
> Since
> > > it
> > > > is proposing making improvements to Druid, I thought maybe it would
> be
> > > > appropriate to resend here. Hope that's OK.)
> > > >
> > > > We had a big outage in our Druid cluster last week.  We run our Druid
> > > > servers in Kubernetes, and our historicals use machine local SSDs for
> > > their
> > > > segment caches.  We made the unfortunate choice to have our
> production
> > > and
> > > > staging historicals share the same pool of machines, and today got
> bit
> > by
> > > > this for the first time.
> > > >
> > > > A production historical started up on a machine whose segment cache
> > > > contained segments from our staging cluster.  Our prod and staging
> > > clusters
> > > > use the same names for data sources.
> > > >
> > > > This meant that these segments overshadowed production segments which
> > > > happened to have lower versions.  Worse, when
> > > > DruidCoordinatorCleanupOvershadowed kicked in, all of the production
> > > > segments that were overshadowed got used=false set, and quickly got
> > > dropped
> > > > from historicals. This ended up being the majority of our data.  We
> > > > eventually figured out what was going on and did a bunch of manual
> > steps
> > > to
> > > > clean up (turning off and clearing the cache of the two historicals
> > that
> > > > had staging segments on them, manually setting used=true for all
> > entries
> > > in
> > > > druid_segments, waiting a long long time for data to re-download),
> but
> > > > figuring out what was going on was subtle (I was very lucky I had
> > > randomly
> > > > decided to read a lot of the code about how the `used` column works
> and
> > > how
> > > > versioned timelines are calculated just a few days before!).
> > > >
> > > > (We were also lucky that we had turned off coordinator automatic
> > killing
> > > > literally that morning!)
> > > >
> > > > I feel like Druid should have been able to protect me from this to
> some
> > > > degree. (Yes, we are going to address the root cause by making it
> > > > impossible for prod and staging to reuse each others' disks.) Some
> > > thoughts
> > > > on changes that could have helped:
> > > >
> > > > - Is the Druid standard to prepend the "cluster" name to the data
> > source
> > > > name, so that conflicts like this are never possible?  We are
> certainly
> > > > tempted to do this now but nobody ever told us to. If that's the
> > > standard,
> > > > should it be documented?
> > > >
> > > > - Should clusters have an optional name/namespace, and DataSegments
> > have
> > > > that namespace recorded in it, and clusters refuse to handle segments
> > > they
> > > > find that are from a different namespace? This would be like the
> common
> > > > database setup where a single server/cluster has a set of database
> > which
> > > > each have a set of tables.
> > > >
> > > > - Should historicals refuse to announce segments that don't exist in
> > the
> > > > druid_segments table, or should coordinators/brokers/etc refuse to
> pay
> > > > attention to segments announced *by historicals* that don't exist in
> > the
> > > > druid_segments table.  I'm going to guess this is difficult to do in
> > the
> > > > historical because the historical probably doesn't actually talk to
> the
> > > sql
> > > > DB at all? But maybe it could be done by coordinator and broker?
> > > >
> > > > --dave
> > > >
> > >
> >
>

Reply via email to