Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis
Hi,

To avoid string concatenation using dates, I figured I could write a C
function:

#include "postgres.h"
#include "fmgr.h"
#include "utils/date.h"
#include "utils/nabstime.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

Datum dateserial (PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1 (dateserial);

Datum dateserial (PG_FUNCTION_ARGS) {
  int32 p_year = PG_GETARG_INT32(0);
  int32 p_month = PG_GETARG_INT32(1);
  int32 p_day = PG_GETARG_INT32(2);

  DateADT d = date2j (p_year, p_month, p_day) - POSTGRES_EPOCH_JDATE;
  PG_RETURN_DATEADT(d);
}

Compiles without errors or warnings. The function is integrated as follows:

CREATE OR REPLACE FUNCTION dateserial(integer, integer, integer)
  RETURNS text AS
'ymd.so', 'dateserial'
  LANGUAGE 'c' IMMUTABLE STRICT
  COST 1;

However, when I try to use it, the database segfaults:

select dateserial( 2007, 1, 3 )

Any ideas why?

Thank you!

Dave

P.S.
I have successfully written a function that creates a mmDD formatted
string (using *sprintf*) when given three integers. It returns as expected;
I ran it as follows:

dateserial( extract(YEAR FROM m.taken)::int, 1, 1 )::date

This had a best-of-three time of 3.7s compared with 4.3s using string
concatenation. If I can eliminate all the typecasts, and pass in m.taken
directly (rather than calling *extract*), I think the speed will be closer
to 2.5s.

Any hints would be greatly appreciated.


Re: [PERFORM] Analysis Function

2010-06-11 Thread Tim Landscheidt
David Jarvis  wrote:

> [...]
> Yes. Here are the variations I have benchmarked (times are best of three):

> Variation #0
> -no date field-
> Explain: http://explain.depesz.com/s/Y9R
> Time: 2.2s

> Variation #1
> date('1960-1-1')
> Explain: http://explain.depesz.com/s/DW2
> Time: 2.6s

> Variation #2
> date('1960'||'-1-1')
> Explain: http://explain.depesz.com/s/YuX
> Time: 3.1s

> Variation #3
> date(extract(YEAR FROM m.taken)||'-1-1')
> Explain: http://explain.depesz.com/s/1I
> Time: 4.3s

> Variation #4
> to_date( date_part('YEAR', m.taken)::text, '' ) + interval '0 months' +
> interval '0 days'
> Explain: http://explain.depesz.com/s/fIT
> Time: 4.4s

> What I would like is along Variation #5:

> *PGTYPESdate_mdyjul(taken_year, p_month1, p_day1)*
> Time: 2.3s

> I find it interesting that variation #2 is half a second slower than
> variation #1.
> [...]

Have you tested DATE_TRUNC()?

Tim


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis
Hi, Tim.

Have you tested DATE_TRUNC()?
>

Not really; it returns a full timestamp and I would still have to
concatenate strings. My goal is to speed up the following code (where
*p_*parameters are user inputs):

*date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''')
d1,
date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''')
d2*

Using DATE_TRUNC() won't help here, as far as I can tell. Removing the
concatenation will halve the query's time. Such as:

dateserial( m.taken, p_month1, p_day1 ) d1,
dateserial( m.taken, p_month2, p_day2 ) d2

My testing so far has shown a modest improvement by using a C function (to
avoid concatenation).

Dave


Re: [PERFORM] Analysis Function

2010-06-11 Thread Heikki Linnakangas

On 11/06/10 11:25, David Jarvis wrote:

Datum dateserial (PG_FUNCTION_ARGS) {
   int32 p_year = PG_GETARG_INT32(0);
   int32 p_month = PG_GETARG_INT32(1);
   int32 p_day = PG_GETARG_INT32(2);

   DateADT d = date2j (p_year, p_month, p_day) - POSTGRES_EPOCH_JDATE;
   PG_RETURN_DATEADT(d);
}

Compiles without errors or warnings. The function is integrated as follows:

CREATE OR REPLACE FUNCTION dateserial(integer, integer, integer)
   RETURNS text AS
'ymd.so', 'dateserial'
   LANGUAGE 'c' IMMUTABLE STRICT
   COST 1;

However, when I try to use it, the database segfaults:

select dateserial( 2007, 1, 3 )

Any ideas why?


The C function returns a DateADT, which is a typedef for int32, but the 
CREATE FUNCTION statement claims that it returns 'text'.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query about index usage

2010-06-11 Thread Jayadevan M
Hello all,

One query about PostgreSQL's index usage. If I select just one column on 
which there is an index (or select only columns on which there is an 
index), and the index is used by PostgreSQL, does PostgreSQL avoid table 
access if possible?  I am trying to understand the differences between 
Oracle's data access patterns and PostgreSQL's. 
Here is how it works in Oracle.

Case 1 - SELECT column which is not there in the index 

SQL> select name from myt where id = 13890;

NAME
---



Execution Plan
--
Plan hash value: 2609414407

-
| Id  | Operation   | Name  | Rows  | Bytes | Cost (%CPU)| 
Time|
-
|   0 | SELECT STATEMENT|   | 1 |65 | 2   (0)| 
00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| MYT   | 1 |65 | 2   (0)| 
00:00:01 |
|*  2 |   INDEX RANGE SCAN  | MYIDX | 1 |   | 1   (0)| 
00:00:01 |
-

Predicate Information (identified by operation id):
---

   2 - access("ID"=13890)

Note
-
   - dynamic sampling used for this statement


Statistics
--
  0  recursive calls
  0  db block gets
  4  consistent gets
  0  physical reads
  0  redo size
409  bytes sent via SQL*Net to client
384  bytes received via SQL*Net from client
  2  SQL*Net roundtrips to/from client
  0  sorts (memory)
  0  sorts (disk)
  1  rows processed

 
 
Case 1 - SELECT column which is there in the index 

SQL> select id from myt where id = 13890;

ID
--
 13890


Execution Plan
--
Plan hash value: 2555454399

--
| Id  | Operation| Name  | Rows  | Bytes | Cost (%CPU)| Time |
--
|   0 | SELECT STATEMENT |   | 1 |13 | 1   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| MYIDX | 1 |13 | 1   (0)| 00:00:01 |
--

Predicate Information (identified by operation id):
---

   1 - access("ID"=13890)

Note
-
   - dynamic sampling used for this statement


Statistics
--
  0  recursive calls
  0  db block gets
  3  consistent gets
  0  physical reads
  0  redo size
407  bytes sent via SQL*Net to client
384  bytes received via SQL*Net from client
  2  SQL*Net roundtrips to/from client
  0  sorts (memory)
  0  sorts (disk)
  1  rows processed

In the second query where id was selected, the table was not used at all. 
In PosgreSQL, explain gives me similar output in both cases.
Table structure - 

postgres=# \d myt
 Table "public.myt"
 Column | Type  | Modifiers
+---+---
 id | integer   |
 name   | character varying(20) |
Indexes:
"myidx" btree (id)


Regards,
Jayadevan





DISCLAIMER: 

"The information in this e-mail and any attachment is intended only for 
the person to whom it is addressed and may contain confidential and/or 
privileged material. If you have received this e-mail in error, kindly 
contact the sender and destroy all copies of the original communication. 
IBS makes no warranty, express or implied, nor guarantees the accuracy, 
adequacy or completeness of the information contained in this email or any 
attachment and is not liable for any errors, defects, omissions, viruses 
or for resultant loss or damage, if any, direct or indirect."






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-11 Thread Tim Landscheidt
David Jarvis  wrote:

>> Have you tested DATE_TRUNC()?

> Not really; it returns a full timestamp and I would still have to
> concatenate strings. My goal is to speed up the following code (where
> *p_*parameters are user inputs):

> *date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''')
> d1,
> date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''')
> d2*

> Using DATE_TRUNC() won't help here, as far as I can tell. Removing the
> concatenation will halve the query's time. Such as:

> dateserial( m.taken, p_month1, p_day1 ) d1,
> dateserial( m.taken, p_month2, p_day2 ) d2

> My testing so far has shown a modest improvement by using a C function (to
> avoid concatenation).

You could use:

| (DATE_TRUNC('year', m.taken) + p_month1 * '1 month'::INTERVAL + p_day1 * '1 
day'::INTERVAL)::DATE

but whether that is faster or slower I don't know. But I
don't see why this query needs to be fast in the first
place. It seems to be interactive, and therefore I wouldn't
invest too much time to have the user wait not 4.4, but
2.2 seconds. You could also do the concatenation in the ap-
plication if that is faster than PostgreSQL's date arithme-
tics.

Tim


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query performance

2010-06-11 Thread Kenneth Marshall
Hi Anj,

That is an indication that your system was less correctly
modeled with a random_page_cost=2 which means that the system
will assume that random I/O is cheaper than it is and will
choose plans based on that model. If this is not the case,
the plan chosen will almost certainly be slower for any
non-trivial query. You can put a 200mph speedometer in a
VW bug but it will never go 200mph.

Regards,
Ken

On Thu, Jun 10, 2010 at 07:54:01PM -0700, Anj Adu wrote:
> I changed random_page_cost=4 (earlier 2) and the performance issue is gone
> 
> I am not clear why a page_cost of 2 on really fast disks would perform badly.
> 
> Thank you for all your help and time.
> 
> On Thu, Jun 10, 2010 at 8:32 AM, Anj Adu  wrote:
> > Attached
> >
> > Thank you
> >
> >
> > On Thu, Jun 10, 2010 at 6:28 AM, Robert Haas  wrote:
> >> On Wed, Jun 9, 2010 at 11:17 PM, Anj Adu  wrote:
> >>> The plan is unaltered . There is a separate index on theDate as well
> >>> as one on node_id
> >>>
> >>> I have not specifically disabled sequential scans.
> >>
> >> Please do "SHOW ALL" and attach the results as a text file.
> >>
> >>> This query performs much better on 8.1.9 on a similar sized
> >>> table.(althought the random_page_cost=4 on 8.1.9 and 2 on 8.4.0 )
> >>
> >> Well that could certainly matter...
> >>
> >> --
> >> Robert Haas
> >> EnterpriseDB: http://www.enterprisedb.com
> >> The Enterprise Postgres Company
> >>
> >
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query performance

2010-06-11 Thread Anj Adu
Is there a way to determine a reasonable value for random_page_cost
via some testing with OS commands. We have several postgres databases
and determining this value on a case by case basis may not be viable
(we may have to go with the defaults)

On Fri, Jun 11, 2010 at 5:44 AM, Kenneth Marshall  wrote:
> Hi Anj,
>
> That is an indication that your system was less correctly
> modeled with a random_page_cost=2 which means that the system
> will assume that random I/O is cheaper than it is and will
> choose plans based on that model. If this is not the case,
> the plan chosen will almost certainly be slower for any
> non-trivial query. You can put a 200mph speedometer in a
> VW bug but it will never go 200mph.
>
> Regards,
> Ken
>
> On Thu, Jun 10, 2010 at 07:54:01PM -0700, Anj Adu wrote:
>> I changed random_page_cost=4 (earlier 2) and the performance issue is gone
>>
>> I am not clear why a page_cost of 2 on really fast disks would perform badly.
>>
>> Thank you for all your help and time.
>>
>> On Thu, Jun 10, 2010 at 8:32 AM, Anj Adu  wrote:
>> > Attached
>> >
>> > Thank you
>> >
>> >
>> > On Thu, Jun 10, 2010 at 6:28 AM, Robert Haas  wrote:
>> >> On Wed, Jun 9, 2010 at 11:17 PM, Anj Adu  wrote:
>> >>> The plan is unaltered . There is a separate index on theDate as well
>> >>> as one on node_id
>> >>>
>> >>> I have not specifically disabled sequential scans.
>> >>
>> >> Please do "SHOW ALL" and attach the results as a text file.
>> >>
>> >>> This query performs much better on 8.1.9 on a similar sized
>> >>> table.(althought the random_page_cost=4 on 8.1.9 and 2 on 8.4.0 )
>> >>
>> >> Well that could certainly matter...
>> >>
>> >> --
>> >> Robert Haas
>> >> EnterpriseDB: http://www.enterprisedb.com
>> >> The Enterprise Postgres Company
>> >>
>> >
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query performance

2010-06-11 Thread Kenneth Marshall
If you check the archives, you will see that this is not easy
to do because of the effects of caching. The default values
were actually chosen to be a good compromise between fully
cached in RAM and totally un-cached. The actual best value
depends on the size of your database, the size of its working
set, your I/O system and your memory. The best recommendation
is usually to use the default values unless you know something
about your system that moves it out of that arena.

Regards,
Ken

On Fri, Jun 11, 2010 at 06:23:31AM -0700, Anj Adu wrote:
> Is there a way to determine a reasonable value for random_page_cost
> via some testing with OS commands. We have several postgres databases
> and determining this value on a case by case basis may not be viable
> (we may have to go with the defaults)
> 
> On Fri, Jun 11, 2010 at 5:44 AM, Kenneth Marshall  wrote:
> > Hi Anj,
> >
> > That is an indication that your system was less correctly
> > modeled with a random_page_cost=2 which means that the system
> > will assume that random I/O is cheaper than it is and will
> > choose plans based on that model. If this is not the case,
> > the plan chosen will almost certainly be slower for any
> > non-trivial query. You can put a 200mph speedometer in a
> > VW bug but it will never go 200mph.
> >
> > Regards,
> > Ken
> >
> > On Thu, Jun 10, 2010 at 07:54:01PM -0700, Anj Adu wrote:
> >> I changed random_page_cost=4 (earlier 2) and the performance issue is gone
> >>
> >> I am not clear why a page_cost of 2 on really fast disks would perform 
> >> badly.
> >>
> >> Thank you for all your help and time.
> >>
> >> On Thu, Jun 10, 2010 at 8:32 AM, Anj Adu  wrote:
> >> > Attached
> >> >
> >> > Thank you
> >> >
> >> >
> >> > On Thu, Jun 10, 2010 at 6:28 AM, Robert Haas  
> >> > wrote:
> >> >> On Wed, Jun 9, 2010 at 11:17 PM, Anj Adu  wrote:
> >> >>> The plan is unaltered . There is a separate index on theDate as well
> >> >>> as one on node_id
> >> >>>
> >> >>> I have not specifically disabled sequential scans.
> >> >>
> >> >> Please do "SHOW ALL" and attach the results as a text file.
> >> >>
> >> >>> This query performs much better on 8.1.9 on a similar sized
> >> >>> table.(althought the random_page_cost=4 on 8.1.9 and 2 on 8.4.0 )
> >> >>
> >> >> Well that could certainly matter...
> >> >>
> >> >> --
> >> >> Robert Haas
> >> >> EnterpriseDB: http://www.enterprisedb.com
> >> >> The Enterprise Postgres Company
> >> >>
> >> >
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> >>
> >
> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query performance

2010-06-11 Thread Matthew Wakeling

On Fri, 11 Jun 2010, Kenneth Marshall wrote:

If you check the archives, you will see that this is not easy
to do because of the effects of caching.


Indeed. If you were to take the value at completely face value, a modern 
hard drive is capable of transferring sequential pages somewhere between 
40 and 100 times faster than random pages, depending on the drive.


However, caches tend to favour index scans much more than sequential 
scans, so using a value between 40 and 100 would discourage Postgres from 
using indexes when they are really the most appropriate option.


Matthew

--
A. Top Posters
> Q. What's the most annoying thing in the world?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query about index usage

2010-06-11 Thread Kevin Grittner
Jayadevan M  wrote:
 
> One query about PostgreSQL's index usage. If I select just one
> column on which there is an index (or select only columns on which
> there is an index), and the index is used by PostgreSQL, does
> PostgreSQL avoid table access if possible?
 
PostgreSQL can't currently avoid reading the table, because that's
where the tuple visibility information is stored.  We've been making
progress toward having some way to avoid reading the table for all
except very recently written tuples, but we're not there yet (in any
production version or in the 9.0 version to be released this
summer).
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query about index usage

2010-06-11 Thread Greg Smith

Jayadevan M wrote:
One query about PostgreSQL's index usage. If I select just one column on 
which there is an index (or select only columns on which there is an 
index), and the index is used by PostgreSQL, does PostgreSQL avoid table 
access if possible?


PostgreSQL keeps information about what rows are visible or not in with 
the row data.  It's therefore impossible at this time for it to answer 
queries just based on what's in an index.  Once candidate rows are found 
using one, the database must then also retrieve the row(s) and do a 
second check as to whether it's visible to the running transaction or 
not before returning them to the client.


Improving this situation is high up on the list of things to improve in 
PostgreSQL and the value of it recognized, it just hasn't been built yet.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] O/T: performance tuning cars

2010-06-11 Thread Dave Crooke
Never say never with computer geeks 
http://www.youtube.com/watch?v=mJyAA0oPAwE

On Fri, Jun 11, 2010 at 7:44 AM, Kenneth Marshall  wrote:

> Hi Anj,
>
> That is an indication that your system was less correctly
> modeled with a random_page_cost=2 which means that the system
> will assume that random I/O is cheaper than it is and will
> choose plans based on that model. If this is not the case,
> the plan chosen will almost certainly be slower for any
> non-trivial query. You can put a 200mph speedometer in a
> VW bug but it will never go 200mph.
>
> Regards,
> Ken
>
> On Thu, Jun 10, 2010 at 07:54:01PM -0700, Anj Adu wrote:
> > I changed random_page_cost=4 (earlier 2) and the performance issue is
> gone
> >
> > I am not clear why a page_cost of 2 on really fast disks would perform
> badly.
> >
> > Thank you for all your help and time.
> >
> > On Thu, Jun 10, 2010 at 8:32 AM, Anj Adu  wrote:
> > > Attached
> > >
> > > Thank you
> > >
> > >
> > > On Thu, Jun 10, 2010 at 6:28 AM, Robert Haas 
> wrote:
> > >> On Wed, Jun 9, 2010 at 11:17 PM, Anj Adu 
> wrote:
> > >>> The plan is unaltered . There is a separate index on theDate as well
> > >>> as one on node_id
> > >>>
> > >>> I have not specifically disabled sequential scans.
> > >>
> > >> Please do "SHOW ALL" and attach the results as a text file.
> > >>
> > >>> This query performs much better on 8.1.9 on a similar sized
> > >>> table.(althought the random_page_cost=4 on 8.1.9 and 2 on 8.4.0 )
> > >>
> > >> Well that could certainly matter...
> > >>
> > >> --
> > >> Robert Haas
> > >> EnterpriseDB: http://www.enterprisedb.com
> > >> The Enterprise Postgres Company
> > >>
> > >
> >
> > --
> > Sent via pgsql-performance mailing list (
> pgsql-performance@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
> >
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Large (almost 50%!) performance drop after upgrading to 8.4.4?

2010-06-11 Thread John Reeve
Greg Smith  2ndquadrant.com> writes:

> 
> Max Williams wrote:
> > Can I just turn this off on 8.4.4 or is it a compile time option
> 
> You can update your postgresql.conf to include:
> 
> debug_assertions = false
> 
> And restart the server.  This will buy you back *some* of the 
> performance loss but not all of it.  Will have to wait for corrected 
> packaged to make the issue completely go away.
> 


Ah! I am so thankful I found this thread. We've been having the same issues 
described here. And when I do a SHOW debug_assertions I get:


postgres=# show debug_assertions;
 debug_assertions
--
 on
(1 row)


Can you let us know when the corrected packages have become available?

Regards,
John



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis
Hi,

The C function returns a DateADT, which is a typedef for int32, but the
> CREATE FUNCTION statement claims that it returns 'text'.
>

That'll do it. Thank you!

but whether that is faster or slower I don't know. But I
> don't see why this query needs to be fast in the first
> place. It seems to be interactive, and therefore I wouldn't
>

When users click the button, I want the result returned in in less under 4
seconds. Right now it is closer to 10. Consequently, they click twice.
Shaving 2 seconds here and there will make a huge difference. It will also
allow the computer to handle a higher volume of requests.


> invest too much time to have the user wait not 4.4, but
> 2.2 seconds. You could also do the concatenation in the ap-
> plication if that is faster than PostgreSQL's date arithme-
> tics.
>

No, I cannot. The concatenation uses the year that the measurement was
made--from the database--and the month/day combination from the user. See
also:

http://stackoverflow.com/questions/2947105/calculate-year-for-end-date-postgresql

Dave


Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis
Hi,

Here is code to convert dates from integers without string concatenation:

Edit dateserial.c:

#include "postgres.h"
#include "utils/date.h"
#include "utils/nabstime.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

Datum dateserial(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1 (dateserial);

Datum
dateserial(PG_FUNCTION_ARGS) {
  int32 p_year = (int32)PG_GETARG_FLOAT8(0);
  int32 p_month = PG_GETARG_INT32(1);
  int32 p_day = PG_GETARG_INT32(2);

  PG_RETURN_DATEADT( date2j( p_year, p_month, p_day ) - POSTGRES_EPOCH_JDATE );
}

Edit Makefile:

MODULES = dateserial
PGXS := $(shell pg_config --pgxs)
include $(PGXS)

Edit inst.sh (optional):

#!/bin/bash

make clean && make && strip *.so && make install &&
/etc/init.d/postgresql-8.4 restart

Run bash inst.sh.

Create a SQL function dateserial:

CREATE OR REPLACE FUNCTION dateserial(double precision, integer, integer)
  RETURNS date AS
'$libdir/dateserial', 'dateserial'
  LANGUAGE 'c' IMMUTABLE STRICT
  COST 1;
ALTER FUNCTION dateserial(double precision, integer, integer) OWNER TO postgres;

Test the function:

SELECT dateserial( 2007, 5, 5 )

Using this function, performance increases from 4.4s to 2.8s..

Dave


Re: [PERFORM] Query about index usage

2010-06-11 Thread Bob Lunney
Jayadevan,

PostgreSQL must go to the table to determine if the row you are requesting is 
visible to your transaction.  This is an artifact of the MVCC implementation.  
Oracle can fetch the data from the index, since it doesn't keep multiple 
representations of the rows, but it may need to check the undo logs to 
determine the state that applies to your transaction.  Its just two different 
ways to accomplish the same thing.

Bob Lunney

--- On Fri, 6/11/10, Jayadevan M  wrote:

> From: Jayadevan M 
> Subject: [PERFORM] Query about index usage
> To: pgsql-performance@postgresql.org
> Date: Friday, June 11, 2010, 5:56 AM
> Hello all,
> 
> One query about PostgreSQL's index usage. If I select just
> one column on 
> which there is an index (or select only columns on which
> there is an 
> index), and the index is used by PostgreSQL, does
> PostgreSQL avoid table 
> access if possible?  I am trying to understand the
> differences between 
> Oracle's data access patterns and PostgreSQL's. 
> Here is how it works in Oracle.
> 
> Case 1 - SELECT column which is not there in the index 
> 
> SQL> select name from myt where id = 13890;
> 
> NAME
> ---
> 
> 
> 
> Execution Plan
> --
> Plan hash value: 2609414407
> 
> -
> | Id  | Operation         
>          | Name  |
> Rows  | Bytes | Cost (%CPU)| 
> Time    |
> -
> |   0 | SELECT STATEMENT     
>       |   
>    |     1 |   
> 65 |     2   (0)| 
> 00:00:01 |
> |   1 |  TABLE ACCESS BY INDEX ROWID|
> MYT   |     1 | 
>   65 |     2   (0)|
> 
> 00:00:01 |
> |*  2 |   INDEX RANGE SCAN   
>       | MYIDX |     1
> |       | 
>    1   (0)| 
> 00:00:01 |
> -
> 
> Predicate Information (identified by operation id):
> ---
> 
>    2 - access("ID"=13890)
> 
> Note
> -
>    - dynamic sampling used for this
> statement
> 
> 
> Statistics
> --
>           0  recursive calls
>           0  db block gets
>           4  consistent gets
>           0  physical reads
>           0  redo size
>         409  bytes sent via
> SQL*Net to client
>         384  bytes received via
> SQL*Net from client
>           2  SQL*Net
> roundtrips to/from client
>           0  sorts (memory)
>           0  sorts (disk)
>           1  rows processed
> 
>  
>  
> Case 1 - SELECT column which is there in the index 
> 
> SQL> select id from myt where id = 13890;
> 
>         ID
> --
>      13890
> 
> 
> Execution Plan
> --
> Plan hash value: 2555454399
> 
> --
> | Id  | Operation        |
> Name  | Rows  | Bytes | Cost (%CPU)| Time 
>    |
> --
> |   0 | SELECT STATEMENT |   
>    |     1 |   
> 13 |     1   (0)|
> 00:00:01 |
> |*  1 |  INDEX RANGE SCAN| MYIDX | 
>    1 |    13 | 
>    1   (0)| 00:00:01 |
> --
> 
> Predicate Information (identified by operation id):
> ---
> 
>    1 - access("ID"=13890)
> 
> Note
> -
>    - dynamic sampling used for this
> statement
> 
> 
> Statistics
> --
>           0  recursive calls
>           0  db block gets
>           3  consistent gets
>           0  physical reads
>           0  redo size
>         407  bytes sent via
> SQL*Net to client
>         384  bytes received via
> SQL*Net from client
>           2  SQL*Net
> roundtrips to/from client
>           0  sorts (memory)
>           0  sorts (disk)
>           1  rows processed
> 
> In the second query where id was selected, the table was
> not used at all. 
> In PosgreSQL, explain gives me similar output in both
> cases.
> Table structure - 
> 
> postgres=# \d myt
>              Table
> "public.myt"
>  Column |         Type 
>         | Modifiers
> +---+---
>  id     | integer     
>          |
>  name   | character varying(20) |
> Indexes:
>     "myidx" btree (id)
> 
> 
> Regards,
> Jayadevan
> 
> 
> 
> 
> 
> DISCLAIMER: 
> 
> "The information in this e-mail and any attachment is
> intended only for 
> the person to whom it is addressed and may contain
> confidential and/or 
> privileged material. If you have received this e-mail in
> error, kindly 
> contact the sender and destroy all copies of the original
> communication. 
> IBS makes no warranty, express or implied, nor guarante

Re: [PERFORM] Analysis Function

2010-06-11 Thread Tim Landscheidt
David Jarvis  wrote:

> [...]
>> invest too much time to have the user wait not 4.4, but
>> 2.2 seconds. You could also do the concatenation in the ap-
>> plication if that is faster than PostgreSQL's date arithme-
>> tics.

> No, I cannot. The concatenation uses the year that the measurement was
> made--from the database--and the month/day combination from the user. See
> also:

> http://stackoverflow.com/questions/2947105/calculate-year-for-end-date-postgresql

That page doesn't deal with "select year from database and
month/day from user and present the results", but *much*
different problems.

Tim


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-11 Thread Tom Lane
David Jarvis  writes:
> dateserial(PG_FUNCTION_ARGS) {
>   int32 p_year = (int32)PG_GETARG_FLOAT8(0);
>   int32 p_month = PG_GETARG_INT32(1);
>   int32 p_day = PG_GETARG_INT32(2);

Er ... why float?  Integer is plenty for the range of years supported by
the PG datetime infrastructure.  The above coding is pretty lousy in terms
of its roundoff and overflow behavior, too.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis
Hi, Tom.

extract(YEAR FROM m.taken)

I thought that returned a double precision?

Dave


Re: [PERFORM] Analysis Function

2010-06-11 Thread David Jarvis
Hi,

I added an explicit cast in the SQL:

dateserial(extract(YEAR FROM
m.taken)::int,'||p_month1||','||p_day1||') d1,
dateserial(extract(YEAR FROM
m.taken)::int,'||p_month2||','||p_day2||') d2

The function now takes three integer parameters; there was no performance
loss.

Thank you.

Dave