Denis, Several clarifying questions: 1. Do you have an idea why metadata registration takes so long? So poor disks? So many data to write? A contention with disk writes by other subsystems? 2. Do we need a persistent metadata for in-memory caches? Or is it so accidentally?
Generally, I think that it is possible to move metadata saving operations out of discovery thread without loosing required consistency/integrity. As Alex mentioned using metastore looks like a better solution. Do we really need a fast fix here? (Are we talking about fast fix?) ср, 14 авг. 2019 г. в 11:45, Zhenya Stanilovsky <arzamas...@mail.ru.invalid>: > > Alexey, but in this case customer need to be informed, that whole (for > example 1 node) cluster crash (power off) could lead to partial data > unavailability. > And may be further index corruption. > 1. Why your meta takes a substantial size? may be context leaking ? > 2. Could meta be compressed ? > > > >Среда, 14 августа 2019, 11:22 +03:00 от Alexei Scherbakov > ><alexey.scherbak...@gmail.com>: > > > >Denis Mekhanikov, > > > >Currently metadata are fsync'ed on write. This might be the case of > >slow-downs in case of metadata burst writes. > >I think removing fsync could help to mitigate performance issues with > >current implementation until proper solution will be implemented: moving > >metadata to metastore. > > > > > >вт, 13 авг. 2019 г. в 17:09, Denis Mekhanikov < dmekhani...@gmail.com >: > > > >> I would also like to mention, that marshaller mappings are written to disk > >> even if persistence is disabled. > >> So, this issue affects purely in-memory clusters as well. > >> > >> Denis > >> > >> > On 13 Aug 2019, at 17:06, Denis Mekhanikov < dmekhani...@gmail.com > > >> wrote: > >> > > >> > Hi! > >> > > >> > When persistence is enabled, binary metadata is written to disk upon > >> registration. Currently it happens in the discovery thread, which makes > >> processing of related messages very slow. > >> > There are cases, when a lot of nodes and slow disks can make every > >> binary type be registered for several minutes. Plus it blocks processing of > >> other messages. > >> > > >> > I propose starting a separate thread that will be responsible for > >> writing binary metadata to disk. So, binary type registration will be > >> considered finished before information about it will is written to disks on > >> all nodes. > >> > > >> > The main concern here is data consistency in cases when a node > >> acknowledges type registration and then fails before writing the metadata > >> to disk. > >> > I see two parts of this issue: > >> > Nodes will have different metadata after restarting. > >> > If we write some data into a persisted cache and shut down nodes faster > >> than a new binary type is written to disk, then after a restart we won’t > >> have a binary type to work with. > >> > > >> > The first case is similar to a situation, when one node fails, and after > >> that a new type is registered in the cluster. This issue is resolved by the > >> discovery data exchange. All nodes receive information about all binary > >> types in the initial discovery messages sent by other nodes. So, once you > >> restart a node, it will receive information, that it failed to finish > >> writing to disk, from other nodes. > >> > If all nodes shut down before finishing writing the metadata to disk, > >> then after a restart the type will be considered unregistered, so another > >> registration will be required. > >> > > >> > The second case is a bit more complicated. But it can be resolved by > >> making the discovery threads on every node create a future, that will be > >> completed when writing to disk is finished. So, every node will have such > >> future, that will reflect the current state of persisting the metadata to > >> disk. > >> > After that, if some operation needs this binary type, it will need to > >> wait on that future until flushing to disk is finished. > >> > This way discovery threads won’t be blocked, but other threads, that > >> actually need this type, will be. > >> > > >> > Please let me know what you think about that. > >> > > >> > Denis > >> > >> > > > >-- > > > >Best regards, > >Alexei Scherbakov > > > -- > Zhenya Stanilovsky -- Best regards, Ivan Pavlukhin