Is there a reason not to use a relational model instead of json(b) here? I think that is in fact considered best practice.
On Fri, 8 Mar 2019 at 15:40, Alexandru Lazarev <alexandru.laza...@gmail.com> wrote: > I am working on product managing and monitoring Network (NMS-like > products). > > Product manages configuration of network devices, for now each device has > stored its configuration in simple table - this was the original design. > > CREATE TABLE public.configuration( > id integer NOT NULL, > config json NOT NULL, > CONSTRAINT configuration_pkey PRIMARY KEY (id),) > > A config looks like: > > { > "_id": 20818132, > "type": "Modem", > "data": [{ > "oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40", > "instance": "24", > "value": "null" > }, > { > "oid": "1.3.6.1.4.1.9999.3.5.10.1.86", > "instance": "0", > "value": "562" > }, > { > "oid": "1.3.6.1.4.1.9999.3.5.10.3.92.4.1", > "instance": "0", > "value": "0" > }, > { > "oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43", > "instance": "24", > "value": "vlan24" > }, > { > "oid": "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43", > "instance": "25", > "value": "vlan25" > } > ]} > > And there are many plv8 (java script procedural language extension for > PostgreSQL) stored procedures working on bulks of such config, reading some > OIDs, changing them conditionally, removing some of them and adding others, > especially in use cases like: There are some upper-level META-configuration > of different level, which during change have to update all their updated > parameters to all affected leaves configs. An simple test-example (but > without touching 'data' node) > > CREATE OR REPLACE FUNCTION public.process_jsonb_plv8() > RETURNS void AS$BODY$ > var CFG_TABLE_NAME = "configurations"; > var selPlan = plv8.prepare( "select c.config from " + CFG_TABLE_NAME + " c > where c.id = $1", ['int'] ); > var updPlan = plv8.prepare( 'update ' + CFG_TABLE_NAME + ' set config = $1 > where id = $2', ['jsonb','int'] ); > > try { > > var ids = plv8.execute('select id from devices'); > > for (var i = 0; i < ids.length; i++) { > var db_cfg = selPlan.execute(ids[i].id); //Get current json config > from DB > var cfg = db_cfg[0].config; > cfg["key0"] = 'plv8_json'; //-add some dummy key > updPlan.execute(cfg, ids[i].id); //put uopdated JSON config in DB > plv8.elog(NOTICE, "UPDATED = " + ids[i].id); > > > }} finally { > selPlan.free(); > updPlan.free();} > return;$BODY$ > LANGUAGE plv8 VOLATILE > COST 100; > > For real use-cases plv8 SPs are more complicated, doing FOR-LOOP through > ALL OIDs object of 'data' array, checking if it is looking for and update > value an/or remove it and/or add newer if necessary. > > Since number of devices in DB increased from several hundreds to 40K or > even 70K, and number of OID+Instance combinations also increased from > several hundred to ~1K and sometimes up to 10K within a config, we start > facing slowness in bulk (especially global -> update to ALL Devices) > updates/searches. > > In order to get rid off FOR LOOP step for each configuration I've > converted data-node from array to object (key-value model), something like > : > > { > "_id": 20818132, > "type": "Modem", > "data": { > "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.40": { > "24": "null" > }, > "1.3.6.1.4.1.9999.3.5.10.1.86": { > "0": "562" > }, > "1.3.6.1.4.1.9999.3.5.10.3.92.4.1": { > "0": "0" > }, > "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43": { > "24": "vlan24", > "25": "vlan25" > } > }} > > Now in order to get a concrete OID (e.g. > "1.3.6.1.4.1.9999.3.10.2.2.25.4.1.43") and/or its instance I do 1-2 *O(1)* > operations instead *O(n)*. And it become a bit faster. After I've changed > column type from json to jsonb - I've got a lot of memory issues with > plv8 stored procedures, so now ideas is: > > *What are the best practices to store such data and use cases in DB?* > taking in considerations following: - Bulk and global updates are often > enough (user-done operation) - several times per week and it takes long > time - several minutes, annoying user experience. - Consulting some OIDs > only from concrete config is medium frequency use case - Consulting ALL > devices have some specific OID (SNMP Parameter) settled to a specific value > - medium frequency cases. - Consult (read) a configuration for a specific > device as a whole document - often use case (it is send to device as json > or as converted CSV, it is send in modified json format to other utilities, > etc) > > One of suggestion from other oppinions is to move ALL configurations to > simple plain relational table > > CREATE TABLE public.configuration_plain( > device_id integer, > oid text, > instance text, > value text) > > Looking like > > *id* > > *oid* > > *instance* > > *value* > > 20818132 > > 1.3.6.1.4.1.9999.2.13 > > 0 > > VSAT > > 20818132 > > 1.3.6.1.4.1.9999.3.10.2.2.10.15 > > 0 > > 0 > > 20818132 > > 1.3.6.1.4.1.9999.3.10.2.2.10.17 > > 0 > > 0 > > 20818132 > > 1.3.6.1.4.1.9999.3.10.2.2.10.18 > > 0 > > 1 > > 20818132 > > 1.3.6.1.4.1.9999.3.10.2.2.10.19 > > 0 > > 2 > > 20818132 > > 1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1 > > 24 > > 24 > > 20818132 > > 1.3.6.1.4.1.9999.3.10.2.2.10.8.1.1 > > 25 > > 25 > > 20818132 > > 1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2 > > 24 > > vlan24 > > 20818132 > > 1.3.6.1.4.1.9999.3.10.2.2.10.8.1.2 > > 25 > > VLAN_25 > > And now I end with a table of ~33 M rows for 40K devices * (700-900 > OID+Instance combinations). Some simple selects and updates (especially if > I add simple indexes on id, oid columns) works faster than JSON (less than > 1 sec updating one OID for ALL devices), but on some stored procedures > where I need to do some checks and business logic before manipulating > concrete parameters in configuration - performance decrease again from 10 > to 25 seconds in below example with each nee added operation: > > CREATE OR REPLACE FUNCTION public.test_update_bulk_configuration_plain_plpg( > sql_condition text, -- something like 'select id from devices' > new_elements text, --collection of OIDs to be Added or Update, could be > JSON Array or comma separated list, containing 1 or more (100) OIDs > oids_to_delete text --collection of OIDs to Delete > ) > RETURNS void AS$BODY$DECLARE > r integer; > cnt integer; > ids int[]; > lid int;BEGIN > RAISE NOTICE 'start'; > EXECUTE 'SELECT ARRAY(' || sql_condition || ')' into ids; > FOREACH lid IN ARRAY ids > LOOP > -- DELETE > -- Some business logic > -- FOR .. IF .. BEGIN > delete from configuration_plain c where c.oid = > '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '10' and c.device_id = lid; > delete from configuration_plain c where c.oid = 'Other OID' and > instance = 'Index' and c.device_id = lid; > -- other eventual deletes > --END > > -- UPDATE > -- Some business logic > -- FOR .. IF .. BEGIN > update configuration_plain c set value = '2' where c.oid = > '1.3.6.1.4.1.9999.3.5.10.3.87' and c.device_id = lid; > update configuration_plain c set value = '2' where c.oid = > '1.3.6.1.4.1.9999.3.5.10.3.201.1.1' and instance = '1' and c.device_id = lid; > -- other eventual updates > -- END > > --INSERT > insert into configuration_plain (id, oid, instance, value) values > (lid,'1.3.6.1.4.1.9999.3.5.10.3.201.1.1', '11', '11'); > -- OTHER eventually.... > insert into configuration_plain (id, oid, instance, value) values > (lid,'OTHER_OID', 'Idx', 'Value of OID'); > END LOOP; > RAISE NOTICE 'end'; > RETURN;END$BODY$ > LANGUAGE plpgsql VOLATILE > COST 100; > > So any best practices and advice on such data and use cases modeling in DB? > > Regards, > > AlexL > -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.