Re: Zk big files issues and model store

Florin Babes Wed, 18 Oct 2023 12:39:50 -0700

Thanks for the suggestion Matthias. I will look into this.

Hello Christine. One of the concerns is the split nature but also that
if the file does not exist on disk when the replica reloads, the core
would not load. To keep the models in sync on each node can be quite
complicated. For example you will only have to reload the collection
only after the main model is present on all nodes, if you do it before
that the replicas will be unusable. For now we would like to load
models up to 100MB and that's why I explored this option.
I did some modifications in the code but I haven't tested them yet.
After I do the tests, I will come with a PR. Can I open an issue with
this?

If the model would be wrapped internal, wouldn't that be the same as
saving it as compacted json? It will be the approximately same size
and we will still need to load in memory the decoded object. To save
size we could reduce the features size to some abbreviations but that
will complicate the score debug.
Haven't looked yet in storing models in another format. Walter could
have a point in AVRO.

Thanks for the suggestion Eric. I am not familiar with the
/api/cluster/files endpoint. I will look into it.

În mie., 18 oct. 2023 la 01:47, Dmitri Maziuk <dmitri.maz...@gmail.com> a scris:
>
> On 10/17/23 13:20, Walter Underwood wrote:
> >
> > Gzipping the JSON can be a big win, especially if there are lots of 
> > repeated keys, like in state.json. Gzip has the advantage that some editors 
> > can natively unpack it.
>
> It may save you some transfer time, provided the transport subsystem
> doesn't compress on the fly, but with JSON being all-or-nothing format,
> your problem's going to be RAM for the string representation plus RAM
> for the decoded object representation, of the entire store.
>
> If you want it scalable, you want an "incremental" format like asn.1,
> protocol buffers, or avro.
>
> Dima
>

Re: Zk big files issues and model store

Reply via email to