[GENERAL] B-tree fan-out

2007-06-22 Thread cluster
What is the fan-out (number of child nodes) on each B-tree node in postgresql? Is it dependent of the size of the keys being indexed? If so: How? In B-trees all non-leaf nodes have a bunch of pointers to its child nodes. What is the size of such a pointer? Thanks --

[GENERAL] B-tree fan-out

2007-06-22 Thread cluster
What is the fan-out (number of child nodes) on each B-tree node in postgresql? Is it dependent of the size of the keys being indexed? If so: How? In B-trees all non-leaf nodes have a bunch of pointers to its child nodes. What is the size of such a pointer? Thanks ---(en

Re: [GENERAL] B-tree fan-out

2007-06-23 Thread cluster
I doubt this is in the documentation, but you can always read the source. I'd take a look around: > src/backend/access/nbtree/ Maybe its just me that is blind, but I couldn't find anything on this particular issue there. :-( Any other suggestions? ---(end of broadcas

Re: [GENERAL] B-tree fan-out

2007-06-23 Thread cluster
In any case, I think the answer to your original question is that the fan-out can be up to several hundred per level, but it's not fixed. OK, its beginning to make sense. So the fan-out is given by the key size and each child node is stored in its own page. Is that correct? Thanks in advance!

Re: [GENERAL] B-tree fan-out

2007-06-23 Thread cluster
In postgres, everything is done in pages, so how ever many keys fit in a page. Bigs keys mean less. For integers you can fit an awful lot of keys. OK, interesting. Does that mean, that when a node containing only small values (e.g. integers) is split, then it gets an awful lot of child node

[GENERAL] Bitmap index?

2007-06-30 Thread cluster
In postgresql 8.2 I need a bitmap index for a certain column, but how do I create it? (No, a btree is not sufficient. :-) ) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster

[GENERAL] Functions refering directly to table columns?

2007-07-24 Thread cluster
When using a function I need to parse each value as argument, e.g. myFunction(arg1, arg2, ..., arg999) However, if I can guarantee that the function will only be used with a certain table, can I then omit the arguments and letting the function refer to the table columns directly? In that way

Re: [GENERAL] Functions refering directly to table columns?

2007-07-24 Thread cluster
How can I obtain this non-argument-requiring function feature? I solved it myself: Define the function using composite types (http://www.postgresql.org/docs/8.2/interactive/xfunc-sql.html) ---(end of broadcast)--- TIP 4: Have you searched our li

[GENERAL] Safe usage of tsearch2: to_tsquery('')

2007-08-04 Thread cluster
In a web application I would like to use tsearch2 to search for by-user entered key words. That is, the user provides the keywords in a space separated list in some input text field. For that I use to_tsquery('') but I would like to do this in a safe way so that the user cannot misuse to_tsquer

[GENERAL] Modelling tags

2007-08-05 Thread cluster
I am thinking about how to model tags for a web site in the database. (Each site entry is assigned a number of tags and a match can then later be performed between these tags and tags entered by the visitor). Is tsearch2 the way to do it? Each site entry could then store its tags in a tsvector

Re: [GENERAL] Safe usage of tsearch2: to_tsquery('')

2007-08-05 Thread cluster
Isn't plainto_tsquery() what you're looking for? Yes if plainto_tsquery() is safe enough for inputting user search keywords it looks like it is. I didn't knew it existed. Thanks ---(end of broadcast)--- TIP 5: don't forget to increase your free

[GENERAL] tsearch2: plainto_tsquery() with OR?

2007-08-07 Thread cluster
I have some questions related to tsearch2: 1) Is ...WHERE rank(myTsVector, myTsQuery) > 0 ... just as fast as ...WHERE myTsVector @@ myTsQuery... ? 2)) I will use plainto_tsquery() to parse search keys entered by a website user to a tsquery. However, if only some of the entered keywords

Re: [GENERAL] tsearch2: plainto_tsquery() with OR?

2007-08-08 Thread cluster
Does anyone know where I can request an OR-version of plainto_tsquery()? I don't understand why it doesn't exist already: In most cases, when using user entered keywords to search for, there should be returned some rows even though not ALL keywords are matched. ---(en

Re: [GENERAL] tsearch2: plainto_tsquery() with OR?

2007-08-09 Thread cluster
Thanks for your response! Let me try to elaborate what I meant with my original post. If R is the set of words in the tsvector for a given table row and S is the set of keywords to search for (entered by e.g. a website user) I would like to receive all rows for which the intersection between R

[GENERAL] Reliable and fast money transaction design

2007-08-28 Thread cluster
I need a way to perform a series of money transactions (row inserts) together with some row updates in such a way that integrity is ensured and performance is high. I have two tables: ACCOUNTS ( account_id int, balance int ); TRANSACTIONS ( transaction_id int, source_

Re: [GENERAL] Reliable and fast money transaction design

2007-08-29 Thread cluster
OK, thanks. But what with the second question in which the UPDATE is based on a SELECT max(...) statement on another table? How can I ensure that no other process inserts a row between my SELECT max() and UPDATE - making my SELECT max() invalid? A table lock could be an option but I am only in

[GENERAL] Selecting K random rows - efficiently!

2007-10-24 Thread cluster
It has been suggested [1] that a good way to select K random rows efficiently from a table is to 1) add a new column, random_number, to the table and initialize it with random() 2) perform the following query: SELECT * FROM mydata WHERE random_number >= (SELECT RANDO

Re: [GENERAL] Selecting K random rows - efficiently!

2007-10-24 Thread cluster
Another way to look at the problem is: How do I sample a subset of size K efficiently? A query like SAMPLE 1000 OF (SELECT * FROM mydata WHERE ) should return 1000 random rows from the select statement so that two consecutive evaluations of the query would only with very little probabil

Re: [GENERAL] Selecting K random rows - efficiently!

2007-10-24 Thread cluster
How important is true randomness? The goal is an even distribution but currently I have not seen any way to produce any kind of random sampling efficiently. Notice the word "efficiently". The naive way of taking a random sample of size K: (SELECT * FROM mydata ORDER BY random() LIMIT ) is

Re: [GENERAL] Selecting K random rows - efficiently!

2007-10-26 Thread cluster
All you're doing is picking random =subsequences= from the same permutation of the original data. You have some good points in your reply. I am very much aware of this non-random behavior you point out for the "static random-value column" approach but at least it is fast, which is a requiremen

[GENERAL] Suggestions for schema design?

2008-02-20 Thread cluster
I really need some input: In a system that handles money transfers I have a table to store each money transfer. A number of different events can result in a money transfer but the events are so different that each event type is stored in its own table. So we have a schema of the form: TRA