On Fri, Aug 2, 2019 at 5:45 PM Nils Bruin <nbr...@sfu.ca> wrote: > > On Friday, August 2, 2019 at 2:47:04 AM UTC-7, E. Madison Bray wrote: >> >> As I have written before, pickle is not an appropriate format for >> long-term stable serialization, and never has been, as it is >> inherently tied to the code which produced it (and is highly >> Python-specific, at that). If this is an endemic problem for someone, >> they should use a different serialization format. >> > If one reads the documentation of "pickle" in python then one does get the > idea that it is designed to provide serialization that should also work over > longer time stretches. It would take a lot of discipline to do the versioning > correctly and one shouldn't start supporting pickling on new data structures > too soon (it could lead to horribly expensive legacy support when one changes > the way data is stored). It's certainly the kind of serialization format one > comes up with for storing complicated data structures such as those in > computer algebra. > In principle, the discipline can be helped a lot by having a pickle jar that > provides good coverage of (legacy) pickles. > I agree that it's very ambitious to try and support pickling across sage, > across time, and with the loose feature management and high diversity in > developer interests it may well be unachievable/unmaintainable. But I think > this is more a problem with the task, not with the pickle format. > > I agree that for data storage that really needs to be able to stand the test > of time, one needs to go with something human readable/copy-pastable. It's > still open for misinterpretation, but at least one stands a chance of > decoding it when the original tools have disappeared. In reality, the > important thing is to properly document how the data was generated in the > first place.
This is partly why we invented ASDF. ASDF is also quite complex, and can be used to store arbitrarily complicated data structures. But it's mostly human-readable--I say "mostly" because it does support blocks of binary data, though most of the time binary data is stored through a binary array data structure which uses plain text to describe the array format, so as long as the format of the binary part is itself reasonably simple it's easy to reconstruct using the metadata in the plain-text portions. Although it was designed primarily with astronomy applications in mind, the core format is domain-agnostic. It would be really neat to see some ASDF "schemas" (descriptions of how specific types of data are serialized in ASDF) for pure mathematics. [1] https://en.wikipedia.org/wiki/Advanced_Scientific_Data_Format -- You received this message because you are subscribed to the Google Groups "sage-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/sage-devel/CAOTD34aa9m8MWSww-S2Hi6LGYNTbKKo9LeaYMds%3Dv%2B4_9PWKug%40mail.gmail.com.