Re: [GENERAL] Debug Contrib/cube code
On Fri, May 13, 2011 at 09:39:58PM +0530, Nick Raj wrote: > Hi, > I am able to debug postgres by --enable-debug but breakpoint in cube is not > working. Can you more elaborate on your second point? > Even i also tried to change makefile of contrib/cube by -g(debug) but it > gives error. > Any idea is welcome. > > ps: sorry for late reply I don't have any good answers beyond what was already presented here: http://archives.postgresql.org/pgsql-hackers/2011-05/msg00784.php -- Joshua Tolley / eggyknap End Point Corporation http://www.endpoint.com signature.asc Description: Digital signature
[GENERAL] Duplicated Fields
I am designing a database that will include at lot of duplicated data. Does Postgres perform many optimisations to eliminate duplications of fields or should I create additional tables to reference myself? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] UNIQUE constraint on character sequences
Dear list, I would be pleased if you could share some thoughts with me on the following: say I wish to maintain a table with all distinct character sequences (variable length) showing series with strong similarities. Example: "abbbabacccdef" "abbbabaccdcdf" "abbbabaccdcgf" ... "qwtrhdffdd" ... "qwtrhdffdds" ... "qwtrhdffddsspp" "qwtrhdffddsspf" "qwtrhdffddssph" "qwtrhdffddsspL" etc. Think of them as ordered values (array-like), as a set having many values in common, in the same "elements" (that is: positions; in my application, each position has some particular meaning -- the sequence represents a set of particular settings) CREATE TABLE textseqs(txtseq TEXT) What would be an efficient approach in enforcing a UNIQUE constraint? I was thinking of using hashbuckets in a b-tree: CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq ) This index would "cache" hashes for each row. Upon inserting of a new row, traversing the index involves the comparison of two single integers for each node in the b-tree, until the actual hash value (if it exists) was reached. Then, only within that bucket (the hashes won't be unique), the more expensive string comparing is required; involving a sequencial comparison of (potentially mny) characters. Yet, within that bucket, the character series may be expected to show stronger differences than a plain sorted list of all values in the table would have, indexed by a b-tree index. Wouldn't traversing such a plain (non-composite, single column, on: "txtseq") b-tree index involve a sequencial comparison of (potentially many) characters *at EACH NODE* of the tree ? Or am I mistaken that each node is filled with actual values from the txtseq column? Thank you for your input! Cheers, Rob
Re: [GENERAL] Duplicated Fields
On Sun, 15 May 2011 00:05:13 +0930, Daron Ryan wrote about [GENERAL] Duplicated Fields: > I am designing a database that will include at lot of duplicated > data. >Does Postgres perform many optimisations to eliminate duplications of >fields or should I create additional tables to reference myself? http://en.wikipedia.org/wiki/Database_normalization You should do it yourself. It is called "data normalization", and all good database designers have been doing it for decades. -- Regards, Dave [RLU #314465] *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* dwn...@ntlworld.com (David W Noon) *-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* signature.asc Description: PGP signature
Re: [GENERAL] UNIQUE constraint on character sequences
Could you index the reverse of the string so the unique part appears first? On May 14, 2011, at 11:20, InterRob wrote: > Dear list, > > I would be pleased if you could share some thoughts with me on the following: > say I wish to maintain a table with all distinct character sequences > (variable length) showing series with strong similarities. Example: > "abbbabacccdef" > "abbbabaccdcdf" > "abbbabaccdcgf" > ... > "qwtrhdffdd" > ... > "qwtrhdffdds" > ... > "qwtrhdffddsspp" > "qwtrhdffddsspf" > "qwtrhdffddssph" > "qwtrhdffddsspL" > etc. > > Think of them as ordered values (array-like), as a set having many values in > common, in the same "elements" (that is: positions; in my application, each > position has some particular meaning -- the sequence represents a set of > particular settings) > > CREATE TABLE textseqs(txtseq TEXT) > > What would be an efficient approach in enforcing a UNIQUE constraint? > > I was thinking of using hashbuckets in a b-tree: > CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq ) > > This index would "cache" hashes for each row. Upon inserting of a new row, > traversing the index involves the comparison of two single integers for each > node in the b-tree, until the actual hash value (if it exists) was reached. > Then, only within that bucket (the hashes won't be unique), the more > expensive string comparing is required; involving a sequencial comparison of > (potentially mny) characters. Yet, within that bucket, the character > series may be expected to show stronger differences than a plain sorted list > of all values in the table would have, indexed by a b-tree index. Wouldn't > traversing such a plain (non-composite, single column, on: "txtseq") b-tree > index involve a sequencial comparison of (potentially many) characters *at > EACH NODE* of the tree ? Or am I mistaken that each node is filled with > actual values from the txtseq column? > > Thank you for your input! > > Cheers, > Rob -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] UNIQUE constraint on KEY VALUE PAIRS
I am afraid I over-simplyfied my use-case. Sorry. In fact, I am not using a character sequence, but a custom COMPOSITE TYPE; part of which is a hstore: CREATE TYPE SSTORE AS (scope TEXT, kvp hstore); I created some functions and operators around it -- sort-of costly operations for comparing SSTOREs. Well: more costly than it would be to compare INTEGERs, I guess. The sets of Key/Values held by SSTORE will show extensive similarities, that is: common pairs of key/value pairs. So, essentially it looks like this: CREATE TABLE textseqs(the_key SSTORE) CREATE UNIQUE INDEX ON textseqs USING BTREE( hash_sstore(the_key), the_key ) Does this make sense? I am seeking some sensible trade-off between performance and flexibility... Regards, Rob 2011/5/14 David Johnston > Could you index the reverse of the string so the unique part appears first? > > On May 14, 2011, at 11:20, InterRob wrote: > > > Dear list, > > > > I would be pleased if you could share some thoughts with me on the > following: say I wish to maintain a table with all distinct character > sequences (variable length) showing series with strong similarities. > Example: > > "abbbabacccdef" > > "abbbabaccdcdf" > > "abbbabaccdcgf" > > ... > > "qwtrhdffdd" > > ... > > "qwtrhdffdds" > > ... > > "qwtrhdffddsspp" > > "qwtrhdffddsspf" > > "qwtrhdffddssph" > > "qwtrhdffddsspL" > > etc. > > > > Think of them as ordered values (array-like), as a set having many values > in common, in the same "elements" (that is: positions; in my application, > each position has some particular meaning -- the sequence represents a set > of particular settings) > > > > CREATE TABLE textseqs(txtseq TEXT) > > > > What would be an efficient approach in enforcing a UNIQUE constraint? > > > > I was thinking of using hashbuckets in a b-tree: > > CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq ) > > > > This index would "cache" hashes for each row. Upon inserting of a new > row, traversing the index involves the comparison of two single integers for > each node in the b-tree, until the actual hash value (if it exists) was > reached. Then, only within that bucket (the hashes won't be unique), the > more expensive string comparing is required; involving a sequencial > comparison of (potentially mny) characters. Yet, within that bucket, the > character series may be expected to show stronger differences than a plain > sorted list of all values in the table would have, indexed by a b-tree > index. Wouldn't traversing such a plain (non-composite, single column, on: > "txtseq") b-tree index involve a sequencial comparison of (potentially many) > characters *at EACH NODE* of the tree ? Or am I mistaken that each node is > filled with actual values from the txtseq column? > > > > Thank you for your input! > > > > Cheers, > > Rob > >