On Sep 27, 2014, at 7:48 PM, snacktime wrote:

> The schema is that a key is a string, and the value is a string or binary.  I 
> am actually storing protocol buffer messages, but the library gives me the 
> ability to serialize to native protobuf or to json.  Json is useful at times 
> especially for debugging.

I don't know if this will apply to you, but i received significant speed 
improvements on Postgres key searches by using substring indexes.

If your keys are just random hashes, this would probably work well for you.  
if your keys are person-readable, it's probably not going to work as well as 
the distribution of prefix characters will probably be too uniform.

But the general idea is twofold:

        1. create an additional partial index on the key field -- CREATE INDEX 
_entities_id__subst_7 ON entities(substr(id,1,7));
        2. update your SELECTS to search for both the full string AND the 
substring

                - WHERE id = :id
                + WHERE (id = :id) AND (substr(id,1,7) = substr(:id, 1, 7))

By adding in the substring query, the planner will (usually) optimize the 
select by doing a first pass on the substring index.  then it searches that 
limited set for the rest of matching criteria.

on a table with 4MM+ records , introducing a substring index/query improved my 
searches by a few orders of magnitude.    

before trying this indexing strategy, we were actively looking to migrate this 
particular query service off of postgres -- it was such a bottleneck and was 
not scalable.  
now there is no reason to leave in the foreseeable future.


On Sep 27, 2014, at 8:33 PM, Gavin Flower wrote:

>>  This works well because keys are left prefixed with a scope, a delimiter, 
>> and then the actual key for the data.  
> Then I noticed that your id is actually a compound key, and probably would be 
> better modelled as:

if you're able to standardize the scope out, an index of "(scope, 
substring(key,1,7))" might work well.

i only used 1,7 as my key arguments, because that was an optimal speed/space 
mix on my dataset.  depending on yours, a shorter or longer index might be more 
appropriate



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to