Thanks Sebastian, I was also curious if someone ran perf tests for this. Good to know! I can see how it would speed up bulk importing if (1) the importer assumes the ids of existing nodes will not change during execution and (2) keeps a synced mapping of datatype and predicate ids in memory and then (3) inserts nodes/triples that reference the ids of nodes it inserted in the previous query since this would eliminate the overhead of the db returning the auto-incremented ids from insertion.
I haven't read deep enough into the importer code to see if this is a strategy the importer is already using? Unfortunately (on 584 at least, where geometry is another ntype) the importer really struggles beyond a few million nodes, the bottleneck seeming to be the process of checking if a node already exists. I haven't run perf tests yet, but I've added this unique index to my (postgres) nodes table for now so that I can 'DO NOTHING' on insert conflict while I am trying out a multi-threaded importer from node.js: CREATE UNIQUE INDEX idx_node_essence ON nodes(ntype, svalue, ltype, lang); - Blake On Tue, Feb 13, 2018 at 10:56 PM, Sebastian Schaffert < sebastian.schaff...@gmail.com> wrote: > Hi Blake, > > I did performance tests back then, it actually makes a significant > difference on most databases, especially for batch imports. Even more if > the database is not running on localhost. Not sure about the actual numbers > though. You can always switch to the database sequence generator for IDs if > you want to try it out yourself, I think it's still available and it's a > simple configuration option. > > Sebastian > > > Blake Regalia <blake.rega...@gmail.com> schrieb am Mi., 14. Feb. 2018, > 01:00: > >> I can see how this makes sense for future compatibility with distributed >> systems across a variety of RDBMS, although I'm not convinced it's more >> efficient for single nodes (e.g., auto-incrementing fields do not require >> round trips). Thanks for the reply! Just wanted to know while porting a >> bulk importer for 584. >> >> >> - Blake >> >> On Tue, Feb 13, 2018 at 12:15 PM, Sebastian Schaffert < >> sebastian.schaff...@gmail.com> wrote: >> >>> Hi Blake, >>> >>> Auto-increment requires querying the database for the next sequence >>> number (or the last given ID, depending on the database you use), and >>> that's adding another database roundtrip. Snowflake is purely in code, very >>> fast to compute, and safe even in distributed setups. >>> >>> Is it causing problems? >>> >>> Sebastian >>> >>> Blake Regalia <blake.rega...@gmail.com> schrieb am Di., 13. Feb. 2018, >>> 21:11: >>> >>>> What was the justification for using the 'snowflake' bigint type for >>>> the id fields on nodes, triples and namespaces? >>>> >>>> >>>> - Blake >>>> >>> >>