Re: Storage in Gecko

Gregory Szorc Mon, 29 Apr 2013 13:57:12 -0700

Great post, Taras!

Per IRC conversations, we'd like to move subsequent discussion ofactions into a meeting so we can more quickly arrive at a resolution.

Please meet in Gregory Szorc's Vidyo Room at 1400 PDT Tuesday, April 30.That's 2200 UTC. Apologies to the European and east coast crowds. Ifyou'll miss it because it's too late, let me know and I'll considermoving it.


https://v.mozilla.com/flex.html?roomdirect.html&key=yJWrGKmbSi6S

On 4/29/13 10:51 AM, Taras Glek wrote:

* How to robustly write/update small datasets?

#3 above is it for small datasets. The correct way to do this is to
write blobs of JSON to disk. End of discussion.

Writes of data <= ~64K should just be implemented as atomic whole-file
read/write operations. Those are almost always single blocks on disk.

Writing a whole file at once eliminates risk of data corruption.
Incremental updates are what makes sqlite do the WAL/fsync/etc dance
that causes much of the slowness.

We invested a year worth of engineering effort into a pure-js IO library
to facilitate efficient application-level IO. See OS.File docs, eg
https://developer.mozilla.org/en-US/docs/JavaScript_OS.File/OS.File_for_the_main_thread


As you can see from above examples, manual IO is not scary

If one is into convenience APIs, one can create arbitrary json-storage
abstractions in ~10lines of code.

* What about writes > 64K?
Compression gives you 5-10x reduction of json.
https://bugzilla.mozilla.org/show_bug.cgi?id=846410
Compression also means that your read-throughput is up to 5x better too.


* What about fsync-less writes?
Many log-type performance-sensitive data-storage operations are ok with
lossy appends. By lossy I mean "data will be lost if there is a power
outage within a few seconds/minutes of write", consistency is still
important. For this one should create a directory and write out log
entries as checksummed individual files...but one should really use
compression(and get checksums for free).
https://bugzilla.mozilla.org/show_bug.cgi?id=846410 is about
facilitating such an API.

Use-cases here: telemetry saved-sessions, FHR session-statistics.

* What about large datasets?
These should be decided on a case-by-case basis. Universal solutions
will always perform poorly in some dimension.

* What about indexeddb?
IDB is overkill for simple storage needs. It is a restrictive wrapper
over an SQLite schema. Perhaps some large dataset (eg an addressbook) is
a good fit for it. IDB supports filehandles to do raw IO, but that still
requires sqlite to bootstrap, doesn't support compression, etc.
IDB also makes sense as a transitional API for web due to the need to
move away from DOM Local Storage...

* Why isn't there a convenience API for all of the above recommendations?
Because speculatively landing APIs that anticipate future consumers is
risky, results in over-engineering and unpleasant surprises...So give us
use-cases and we(ie Yoric) will make them efficient.

Taras


_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Storage in Gecko

Reply via email to