Thanks for the suggestion Matthias. I will look into this. Hello Christine. One of the concerns is the split nature but also that if the file does not exist on disk when the replica reloads, the core would not load. To keep the models in sync on each node can be quite complicated. For example you will only have to reload the collection only after the main model is present on all nodes, if you do it before that the replicas will be unusable. For now we would like to load models up to 100MB and that's why I explored this option. I did some modifications in the code but I haven't tested them yet. After I do the tests, I will come with a PR. Can I open an issue with this?
If the model would be wrapped internal, wouldn't that be the same as saving it as compacted json? It will be the approximately same size and we will still need to load in memory the decoded object. To save size we could reduce the features size to some abbreviations but that will complicate the score debug. Haven't looked yet in storing models in another format. Walter could have a point in AVRO. Thanks for the suggestion Eric. I am not familiar with the /api/cluster/files endpoint. I will look into it. În mie., 18 oct. 2023 la 01:47, Dmitri Maziuk <dmitri.maz...@gmail.com> a scris: > > On 10/17/23 13:20, Walter Underwood wrote: > > > > Gzipping the JSON can be a big win, especially if there are lots of > > repeated keys, like in state.json. Gzip has the advantage that some editors > > can natively unpack it. > > It may save you some transfer time, provided the transport subsystem > doesn't compress on the fly, but with JSON being all-or-nothing format, > your problem's going to be RAM for the string representation plus RAM > for the decoded object representation, of the entire store. > > If you want it scalable, you want an "incremental" format like asn.1, > protocol buffers, or avro. > > Dima >