On Thursday, 15 December 2016 17:06:39 UTC, Steve D'Aprano wrote: > I have some key:value data where the keys often are found in contiguous > ranges with identical values. For example: > > {1: "foo", > 2: "foo", > 3: "foo", > # same for keys 4 through 99 > 100: "foo", > 101: "bar", > 102: "bar", > 103: "foobar", > 104: "bar", > 105: "foo", > } > ... > -- > Steve
All the answers seem to rely on in-memory solutions. But isn't the problem a classic data design problem (cf Codd) with two tables. CREATE TABLE keys (id INTEGER NOT NULL PRIMARY KEY, kkey INTEGER, UNIQUE (kkey) ); ## eg id = 999, kkey=101 CREATE TABLE values (id INTEGER NOT NULL PRIMARY KEY, k_id INTEGER, value VARCHAR, UNIQUE (k_id, value), FOREIGN KEY (k_id) REFERENCES keys(id)); ## eg k_id = 999, value = "bar" For example, Python/SQLITE can parse the list of key:value pairs. key is looked up in keys -- and a keys row is added if the key is new. The keys id is saved. k_id--value pair is looked up -- and a row is added if the pair is new. Some of this can be simplified by relying on SQL to handle non-UNIQUE errors. This approach may be slower than in-memory processing, but it has almost no database size limits. -- https://mail.python.org/mailman/listinfo/python-list