So there is no general 'good for performance' way of doing IO.
However I think most people who need this need to write small bits of
data and there is a good way to do that.
Gregory Szorc wrote:
I'd like to start a discussion about the state of storage in Gecko.
Currently when you are writing a feature that needs to store data, you
have roughly 3 choices:
1) Preferences
2) SQLite
3) Manual file I/O
* How to robustly write/update small datasets?
#3 above is it for small datasets. The correct way to do this is to
write blobs of JSON to disk. End of discussion.
Writes of data <= ~64K should just be implemented as atomic whole-file
read/write operations. Those are almost always single blocks on disk.
Writing a whole file at once eliminates risk of data corruption.
Incremental updates are what makes sqlite do the WAL/fsync/etc dance
that causes much of the slowness.
We invested a year worth of engineering effort into a pure-js IO library
to facilitate efficient application-level IO. See OS.File docs, eg
https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.File_for_the_main_thread
As you can see from above examples, manual IO is not scary
If one is into convenience APIs, one can create arbitrary json-storage
abstractions in ~10lines of code.
* What about writes > 64K?
Compression gives you 5-10x reduction of json.
https://bugzilla.mozilla.org/show_bug.cgi?id=846410
Compression also means that your read-throughput is up to 5x better too.
* What about fsync-less writes?
Many log-type performance-sensitive data-storage operations are ok with
lossy appends. By lossy I mean "data will be lost if there is a power
outage within a few seconds/minutes of write", consistency is still
important. For this one should create a directory and write out log
entries as checksummed individual files...but one should really use
compression(and get checksums for free).
https://bugzilla.mozilla.org/show_bug.cgi?id=846410 is about
facilitating such an API.
Use-cases here: telemetry saved-sessions, FHR session-statistics.
* What about large datasets?
These should be decided on a case-by-case basis. Universal solutions
will always perform poorly in some dimension.
* What about indexeddb?
IDB is overkill for simple storage needs. It is a restrictive wrapper
over an SQLite schema. Perhaps some large dataset (eg an addressbook) is
a good fit for it. IDB supports filehandles to do raw IO, but that still
requires sqlite to bootstrap, doesn't support compression, etc.
IDB also makes sense as a transitional API for web due to the need to
move away from DOM Local Storage...
* Why isn't there a convenience API for all of the above recommendations?
Because speculatively landing APIs that anticipate future consumers is
risky, results in over-engineering and unpleasant surprises...So give us
use-cases and we(ie Yoric) will make them efficient.
Taras
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform