So there is no general 'good for performance' way of doing IO.

However I think most people who need this need to write small bits of data and there is a good way to do that.

Gregory Szorc wrote:
I'd like to start a discussion about the state of storage in Gecko.

Currently when you are writing a feature that needs to store data, you
have roughly 3 choices:

1) Preferences
2) SQLite
3) Manual file I/O

* How to robustly write/update small datasets?

#3 above is it for small datasets. The correct way to do this is to write blobs of JSON to disk. End of discussion.

Writes of data <= ~64K should just be implemented as atomic whole-file read/write operations. Those are almost always single blocks on disk.

Writing a whole file at once eliminates risk of data corruption. Incremental updates are what makes sqlite do the WAL/fsync/etc dance that causes much of the slowness.

We invested a year worth of engineering effort into a pure-js IO library to facilitate efficient application-level IO. See OS.File docs, eg https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.File_for_the_main_thread

As you can see from above examples, manual IO is not scary

If one is into convenience APIs, one can create arbitrary json-storage abstractions in ~10lines of code.

* What about writes > 64K?
Compression gives you 5-10x reduction of json. https://bugzilla.mozilla.org/show_bug.cgi?id=846410
Compression also means that your read-throughput is up to 5x better too.


* What about fsync-less writes?
Many log-type performance-sensitive data-storage operations are ok with lossy appends. By lossy I mean "data will be lost if there is a power outage within a few seconds/minutes of write", consistency is still important. For this one should create a directory and write out log entries as checksummed individual files...but one should really use compression(and get checksums for free). https://bugzilla.mozilla.org/show_bug.cgi?id=846410 is about facilitating such an API.

Use-cases here: telemetry saved-sessions, FHR session-statistics.

* What about large datasets?
These should be decided on a case-by-case basis. Universal solutions will always perform poorly in some dimension.

* What about indexeddb?
IDB is overkill for simple storage needs. It is a restrictive wrapper over an SQLite schema. Perhaps some large dataset (eg an addressbook) is a good fit for it. IDB supports filehandles to do raw IO, but that still requires sqlite to bootstrap, doesn't support compression, etc. IDB also makes sense as a transitional API for web due to the need to move away from DOM Local Storage...

* Why isn't there a convenience API for all of the above recommendations?
Because speculatively landing APIs that anticipate future consumers is risky, results in over-engineering and unpleasant surprises...So give us use-cases and we(ie Yoric) will make them efficient.

Taras
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to