[SQL] How to find space occupied by postgres on harddisk

2008-07-07 Thread dipesh

Hello,
Myself Dipesh Mistry from Ahmedabad India.
I want to know that if i dump the 5GB sql file then how many space does 
postgres  occupy on harddisk.

Is there any calculation is available?
Or any postgres command can give us this type of information?
Thank you.

--
With Warm Regards,
Dipesh Mistry
Information Technology Dept.
GaneshaSpeaks.com


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] How to find space occupied by postgres on harddisk

2008-07-07 Thread Craig Ringer

dipesh wrote:

Hello,
Myself Dipesh Mistry from Ahmedabad India.
I want to know that if i dump the 5GB sql file then how many space does 
postgres  occupy on harddisk.


Do you mean a 5GB database? If that's what you meant, then the size of 
the resulting dump depends on the dump format, the FILLFACTOR of your 
tables and indices, the number of indices you have, etc.


If you mean that you have a 5GB SQL dump and you want to know how big it 
will be when loaded into PostgreSQL, well, the same applies but in 
reverse. It depends on the table and index fillfactors, how many indexes 
you have, etc.


My database is a bit less than 1GB on disk as stored by PostgreSQL, 
including xlogs, indexes, etc. When I dump it in PostgreSQL's custom 
compressed dump format (pg_dump -Fc) it uses 25MB of storage. It's 
VACUUMed and REINDEXed regularly and has fillfactors of around 60% for 
most tables/indices.


If I use the ordinary uncompressed SQL dump format it uses 140MB.

All this depends on your data. Some data types "expand" more than others 
 when converted from their SQL dump file representation to their 
representation in PostgreSQL's storage. Some are stored smaller in Pg 
than in an SQL dump. Additionally, indexes use space too, potentially 
LOTS of space. Finally, your tables will "waste" some space with deleted 
rows, padding for non-100% fillfactors, etc.


The best thing to do is load it into PostgreSQL and see (or dump it, if 
that's what you meant). That'll tell you for sure. It's not like a 5GB 
dump will take all that long to load.


--
Craig Ringer

--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] how to control the execution plan ?

2008-07-07 Thread Sabin Coanda
Hi there,

I try to execute the following statement:

SELECT *
FROM (
SELECT "MY_FUNCTION_A"(bp."COL_A", NULL::boolean) AS ALIAS_A
FROM "TABLE_A" bp
JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
WHERE pn."Editor"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL AND 
bp."COL_A"::text <> ''::text
) x
WHERE (x.ALIAS_A::text ) IS NULL;

The problem is the excution plan first make Seq Scan on "TABLE_A", with 
Filter: (("COL_A" IS NOT NULL) AND (("COL_A")::text <> ''::text) AND 
(("MY_FUNCTION_A"("COL_A", NULL::boolean))::text IS NULL))". This way, 
MY_FUNCTION_A crashes for some unsupported data provided by  "COL_A".

I'd like to get an execution plan which is filtering first the desired rows, 
and just after that compute te column value "MY_FUNCTION_A"(bp."COL_A", 
NULL::boolean).

I made different combinations, including a subquery like:

SELECT *
FROM (
SELECT "MY_FUNCTION_A"(y."COL_A", NULL::boolean) AS ALIAS_A
FROM (
SELECT bp."COL_A"
FROM "TABLE_A" bp
JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
WHERE pn."COL_E"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL 
AND bp."COL_A"::text <> ''::text
) y
) x
WHERE (x.ALIAS_A::text ) IS NULL;

but postgres analyze is too 'smart' and optimize it as in the previous case, 
with the same Seq Scan on "TABLE_A", and with the same filter.

I thought to change the function MY_FUNCTION_A, to support any argument 
data, but the even that another performance problem will be rised when the 
function will be computed for any row in join, even those that can be 
removed by other filter.

Do you have a solution please ?

TIA
Sabin 



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] how to control the execution plan ?

2008-07-07 Thread Scott Marlowe
On Mon, Jul 7, 2008 at 3:14 AM, Sabin Coanda <[EMAIL PROTECTED]> wrote:
> Hi there,
>
> I try to execute the following statement:
>
> SELECT *
> FROM (
>SELECT "MY_FUNCTION_A"(bp."COL_A", NULL::boolean) AS ALIAS_A
>FROM "TABLE_A" bp
>JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
>JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
>WHERE pn."Editor"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL AND
> bp."COL_A"::text <> ''::text
> ) x
> WHERE (x.ALIAS_A::text ) IS NULL;
>
> The problem is the excution plan first make Seq Scan on "TABLE_A", with
> Filter: (("COL_A" IS NOT NULL) AND (("COL_A")::text <> ''::text) AND
> (("MY_FUNCTION_A"("COL_A", NULL::boolean))::text IS NULL))". This way,
> MY_FUNCTION_A crashes for some unsupported data provided by  "COL_A".
>
> I'd like to get an execution plan which is filtering first the desired rows,
> and just after that compute te column value "MY_FUNCTION_A"(bp."COL_A",
> NULL::boolean).

Just wondering what the query plans look like here, both regular
explain, and if you can wait for it to execute, explain analyze.

I'm guessing that the function is not indexed / indexable.  Is it
marked immutable (and is it actually immutable) or stable (and is
stable)?

If it's immutable then you can create an index on it and that should
speed things up.

>
> I made different combinations, including a subquery like:
>
> SELECT *
> FROM (
>SELECT "MY_FUNCTION_A"(y."COL_A", NULL::boolean) AS ALIAS_A
>FROM (
>SELECT bp."COL_A"
>FROM "TABLE_A" bp
>JOIN "TABLE_B" pn ON bp."COL_B" = pn."PK_ID"
>JOIN "TABLE_C" vbo ON bp."COL_C" = vbo."PK_ID"
>WHERE pn."COL_E"::text ~~ 'Some%'::text AND bp."COL_A" IS NOT NULL
> AND bp."COL_A"::text <> ''::text
>) y
> ) x
> WHERE (x.ALIAS_A::text ) IS NULL;
>
> but postgres analyze is too 'smart' and optimize it as in the previous case,
> with the same Seq Scan on "TABLE_A", and with the same filter.
>
> I thought to change the function MY_FUNCTION_A, to support any argument
> data, but the even that another performance problem will be rised when the
> function will be computed for any row in join, even those that can be
> removed by other filter.
>
> Do you have a solution please ?

If it's still to smart, you can run two queries, one to pull the set
you want to work with from the custom function into a temp table, then
analyze it, then run the query against that.
Not an optimal solution, but it might be the fastest if you can't
index your function.

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql