On Mon, Jun 16, 2025 at 12:21:46AM +0000, Aaron Friel wrote:
> > > > > > > I'm looking to see if I can submit a patch to fix this, but it 
> > > > > > > seems
> > > > > > > like the durability bit field for devices may be only 2 bits, is 
> > > > > > > that
> > > > > > > right?
> > > > > > 
> > > > > > That gets you values of 0-3. Why is that not enough?
> > > > >
> > > > > In bch2_mi_to_cpu, it looks like durability is encoded with a "bias" 
> > > > > (default value) that maps {0,1,2,3} => {1,0,1,2}.
> > > > >
> > > > >             .durability = BCH_MEMBER_DURABILITY(mi)
> > > > >                   ? BCH_MEMBER_DURABILITY(mi) - 1
> > > > >                   : 1,
> > > > >
> > > > > This is pretty unfortunate, because it looks like if I want to use 
> > > > > RAID6 (replicas=3), I can't represent a device as having inherent 
> > > > > durability of RAID6 (durability=3).
> > > > >
> > > > > It doesn't look like too much work to add a feature flag 
> > > > > `BCH_FEATURE_durability_bias_v2` which when set, modifies the bias to 
> > > > > unconditionally add one to the 2-bit field, mapping {0,1,2,3} to 
> > > > > {1,2,3,4}. That would support even very large erasure encoded arrays 
> > > > > as well, where you might use something like RS (56,4) for a common 60 
> > > > > drive JBOD. Practically speaking though I don't think anyone uses 
> > > > > stripes that wide in a single array. At least not for spinning rust, 
> > > > > but it's been a long time since I've worked with enterprise storage 
> > > > > and I understand the rules have changed with flash now.
> > > > >
> > > > > I can submit patches for implementing the feature if you want me to 
> > > > > submit them as a PR. Not sure about your stance on LLM-authored code 
> > > > > though.
> > > >
> > > > Actually there's an easier way, which I've done a few different times
> > > before. We can extend BCH_MEMBER_DURABILITY to 4 bits (should be
> > > sufficient, no?), with the high bits going whenever we've got room in
> > > bch_member.
> > > >
> > > > Rename BCH_MEMBER_DURABILITY -> BCH_MEMBER_DURABILTIY_LO
> > > >
> > > > BCH_MEMBER_DURABILITY_HI for the new two bits
> > > >
> > > > Then write new get/set functions for BCH_MEMBER_DURABILITY that
> > > reads/stores from the lo and hi fields.
> > > >
> > > > But we'd still want a new on disk format version for this, and then use
> > > bch2_request_incompat_feature() whenever attempting to set a durability
> > > htat doesn't fit in the old 2 bit field.
> > >
> > > Do we want the new field to be additive after saturating 
> > > BCH_MEMBER_DURABILITY_LO at 2, rather than treat it as a 4 bit field 
> > > which could result in an older kernel seeing 0b01 and interpreting it as 
> > > 0? So:
> > >
> > > LO    HI    VALUE
> > > # existing range:
> > > 00    00    1
> > > 01    00    0
> > > 10    00    1
> > > 11    00    2
> > > # expanded range:
> > > 11    01    3
> > > 11    10    4
> > > 11    11    5
> > >
> > > Then an older kernel will read any device with durability >2 as having 
> > > durability=2, which is not ideal but I worry that durability=0 might 
> > > result in undefined (or unspecified?) behavior.
> > 
> > No, just make it an incompat feature, it's way simpler - older kernels
> > that don't understand durability > 2 will never see them
> 
> Oh I see, an incompatible feature will prevent older kernels from
> mounting the drive. That makes sense. Do you still want to map with
> the 1 bit bias (mapping 1 to 0b0000, and durability=1 to 0b0001)? It
> seems like that would introduce the fewest changes elsewhere and
> ensure that a zero initialized struct behaves in a rational way. 

There's really no reason to change any of that

Reply via email to