Re: [GENERAL] Debug Contrib/cube code

2011-05-14 Thread Joshua Tolley
On Fri, May 13, 2011 at 09:39:58PM +0530, Nick Raj wrote:
> Hi,
> I am able to debug postgres by --enable-debug but breakpoint in cube is not
> working. Can you more elaborate on your second point?
> Even i also tried to change makefile of contrib/cube by -g(debug) but it
> gives error.
> Any idea is welcome.
> 
> ps: sorry for late reply

I don't have any good answers beyond what was already presented here:

http://archives.postgresql.org/pgsql-hackers/2011-05/msg00784.php

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


[GENERAL] Duplicated Fields

2011-05-14 Thread Daron Ryan
 I am designing a database that will include at lot of duplicated data. 
Does Postgres perform many optimisations to eliminate duplications of 
fields or should I create additional tables to reference myself?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] UNIQUE constraint on character sequences

2011-05-14 Thread InterRob
Dear list,

I would be pleased if you could share some thoughts with me on the
following: say I wish to maintain a table with all distinct character
sequences (variable length) showing series with strong similarities.
Example:
"abbbabacccdef"
"abbbabaccdcdf"
"abbbabaccdcgf"
...
"qwtrhdffdd"
...
"qwtrhdffdds"
...
"qwtrhdffddsspp"
"qwtrhdffddsspf"
"qwtrhdffddssph"
"qwtrhdffddsspL"
etc.

Think of them as ordered values (array-like), as a set having many values in
common, in the same "elements" (that is: positions; in my application, each
position has some particular meaning -- the sequence represents a set of
particular settings)

CREATE TABLE textseqs(txtseq TEXT)

What would be an efficient approach in enforcing a UNIQUE constraint?

I was thinking of using hashbuckets in a b-tree:
CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq )

This index would "cache" hashes for each row. Upon inserting of a new row,
traversing the index involves the comparison of two single integers for each
node in the b-tree, until the actual hash value (if it exists) was reached.
Then, only within that bucket (the hashes won't be unique), the more
expensive string comparing is required; involving a sequencial comparison of
(potentially mny) characters. Yet, within that bucket, the character
series may be expected to show stronger differences than a plain sorted list
of all values in the table would have, indexed by a b-tree index. Wouldn't
traversing such a plain (non-composite, single column, on: "txtseq") b-tree
index involve a sequencial comparison of (potentially many) characters *at
EACH NODE* of the tree ? Or am I mistaken that each node is filled with
actual values from the txtseq column?

Thank you for your input!

Cheers,
Rob


Re: [GENERAL] Duplicated Fields

2011-05-14 Thread David W Noon
On Sun, 15 May 2011 00:05:13 +0930, Daron Ryan wrote about [GENERAL]
Duplicated Fields:

>  I am designing a database that will include at lot of duplicated
> data. 
>Does Postgres perform many optimisations to eliminate duplications of 
>fields or should I create additional tables to reference myself?

http://en.wikipedia.org/wiki/Database_normalization

You should do it yourself.  It is called "data normalization", and all
good database designers have been doing it for decades.
-- 
Regards,

Dave  [RLU #314465]
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
dwn...@ntlworld.com (David W Noon)
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*


signature.asc
Description: PGP signature


Re: [GENERAL] UNIQUE constraint on character sequences

2011-05-14 Thread David Johnston
Could you index the reverse of the string so the unique part appears first?

On May 14, 2011, at 11:20, InterRob  wrote:

> Dear list,
> 
> I would be pleased if you could share some thoughts with me on the following: 
> say I wish to maintain a table with all distinct character sequences 
> (variable length) showing series with strong similarities. Example:
> "abbbabacccdef"
> "abbbabaccdcdf"
> "abbbabaccdcgf"
> ...
> "qwtrhdffdd"
> ...
> "qwtrhdffdds"
> ...
> "qwtrhdffddsspp"
> "qwtrhdffddsspf"
> "qwtrhdffddssph"
> "qwtrhdffddsspL"
> etc.
> 
> Think of them as ordered values (array-like), as a set having many values in 
> common, in the same "elements" (that is: positions; in my application, each 
> position has some particular meaning -- the sequence represents a set of 
> particular settings)
> 
> CREATE TABLE textseqs(txtseq TEXT)
> 
> What would be an efficient approach in enforcing a UNIQUE constraint?
> 
> I was thinking of using hashbuckets in a b-tree:
> CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq )
> 
> This index would "cache" hashes for each row. Upon inserting of a new row, 
> traversing the index involves the comparison of two single integers for each 
> node in the b-tree, until the actual hash value (if it exists) was reached. 
> Then, only within that bucket (the hashes won't be unique), the more 
> expensive string comparing is required; involving a sequencial comparison of 
> (potentially mny) characters. Yet, within that bucket, the character 
> series may be expected to show stronger differences than a plain sorted list 
> of all values in the table would have, indexed by a b-tree index. Wouldn't 
> traversing such a plain (non-composite, single column, on: "txtseq") b-tree 
> index involve a sequencial comparison of (potentially many) characters *at 
> EACH NODE* of the tree ? Or am I mistaken that each node is filled with 
> actual values from the txtseq column?
> 
> Thank you for your input!
> 
> Cheers,
> Rob

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] UNIQUE constraint on KEY VALUE PAIRS

2011-05-14 Thread InterRob
I am afraid I over-simplyfied my use-case. Sorry.

In fact, I am not using a character sequence, but a custom COMPOSITE TYPE;
part of which is a hstore:
CREATE TYPE SSTORE AS (scope TEXT, kvp hstore);

I created some functions and operators around it -- sort-of costly
operations for comparing SSTOREs. Well: more costly than it would be to
compare INTEGERs, I guess.

The sets of Key/Values held by SSTORE will show extensive similarities, that
is: common pairs of key/value pairs.

So, essentially it looks like this:
CREATE TABLE textseqs(the_key SSTORE)

CREATE UNIQUE INDEX ON textseqs USING BTREE( hash_sstore(the_key), the_key )


Does this make sense? I am seeking some sensible trade-off between
performance and flexibility...

Regards,
Rob


2011/5/14 David Johnston 

> Could you index the reverse of the string so the unique part appears first?
>
> On May 14, 2011, at 11:20, InterRob  wrote:
>
> > Dear list,
> >
> > I would be pleased if you could share some thoughts with me on the
> following: say I wish to maintain a table with all distinct character
> sequences (variable length) showing series with strong similarities.
> Example:
> > "abbbabacccdef"
> > "abbbabaccdcdf"
> > "abbbabaccdcgf"
> > ...
> > "qwtrhdffdd"
> > ...
> > "qwtrhdffdds"
> > ...
> > "qwtrhdffddsspp"
> > "qwtrhdffddsspf"
> > "qwtrhdffddssph"
> > "qwtrhdffddsspL"
> > etc.
> >
> > Think of them as ordered values (array-like), as a set having many values
> in common, in the same "elements" (that is: positions; in my application,
> each position has some particular meaning -- the sequence represents a set
> of particular settings)
> >
> > CREATE TABLE textseqs(txtseq TEXT)
> >
> > What would be an efficient approach in enforcing a UNIQUE constraint?
> >
> > I was thinking of using hashbuckets in a b-tree:
> > CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq )
> >
> > This index would "cache" hashes for each row. Upon inserting of a new
> row, traversing the index involves the comparison of two single integers for
> each node in the b-tree, until the actual hash value (if it exists) was
> reached. Then, only within that bucket (the hashes won't be unique), the
> more expensive string comparing is required; involving a sequencial
> comparison of (potentially mny) characters. Yet, within that bucket, the
> character series may be expected to show stronger differences than a plain
> sorted list of all values in the table would have, indexed by a b-tree
> index. Wouldn't traversing such a plain (non-composite, single column, on:
> "txtseq") b-tree index involve a sequencial comparison of (potentially many)
> characters *at EACH NODE* of the tree ? Or am I mistaken that each node is
> filled with actual values from the txtseq column?
> >
> > Thank you for your input!
> >
> > Cheers,
> > Rob
>
>