Il ven 27 lug 2018, 20:30 Sijie Guo <guosi...@gmail.com> ha scritto: > Ivan, > > Thank you for putting this together. It is also good to put this as a BP, > since it is about the metadata layout. > > On Fri, Jul 27, 2018 at 10:54 AM Ivan Kelly <iv...@apache.org> wrote: > > > Hi folks, > > > > I think this was discussed yesterday in the meeting, and a bit on > > slack, but I haven't seen anything much written down, so I'm starting > > a thread here. > > > > The crux of the problem is that the protobuf text format currently > > used for metadata cannot have new fields added without breaking client > > compatability, as the text parser can't be configured to ignore > > unrecognised values (and google aren't going to fix this). > > > > Protobuf binary format does support new fields though. So if a field > > is added, a client that knows nothing of it can read it back without > > issue. > > > > I propose we approach this the following way: > > - We already have a version in /ledgers/LAYOUT. In a current cluster, > > this contains > > ``` > > 2 > > org.apache.bookkeeper.meta.HierarchicalLedgerManagerFactory:1 > > ``` > > - We define a new LedgerMetadata protobuf. This is a chance to clean > > up mistakes we've made previously. > > - When writing a metadata, check what is in /ledgers/LAYOUT. If it is > > as above, write using the current text protobuf. If is bumped, use the > > new binary format. >
isn't it too costly? Adding a zk read for each write. We could add a watch but is has an important cost What about having a client side config writeMetadataVersion ? We start a new metadata version, the new one will be encoded as binary. By default 4.8 clients will use previous version, as we already do for journal and fileinfo on bookies. Each ledger is independent from the others, there is no need of a global flag written on zk. > - When reading metadata, first try to parse binary, and fall back to > > text if that fails. (we could also add a layout check to short > > circuit) > This sounds good. No config needed. > > > When upgrading a cluster, the layout will be as above, it will > > continue to only use text format until there is some admin > > intervention. When the admin is satisfied that all clients are on a > > new enough version, they call a script which bumps the version. From > > this point clients will write the binary version. > > > > New clusters go straight to binary. We will also need a script to dump > > the metadata from a znode. Don't we already have some tool? (Maybe I only have seen such tool in my company applications) One wrinkle, which is another reason to > > create a new LedgerMetadata protobuf, is that when you are writing in > > text format, there's no way to filter the fields. So even if we write > > in text format, if we add new fields we are breaking old clients. > > Another approach would be to ensure the protobuf only contains the > > fields that are available now, but this seems messy to me. > > > > Anyhow, this is to be a jumping off point for discussion. Comments > welcome, > > > > Cheers, > > Ivan > > > -- -- Enrico Olivelli