[PERFORM] Partitioning: INSERT 0 0 but want INSERT 0 1

2008-05-12 Thread Nikolas Everett
I figure this subject belongs on the performance mailing list because it is
about partitioning, which is a performance issue.

I'm working on partitioning some of the tables used by an application that
uses OpenJPA.  It turns out that OpenJPA is sensitive to the numbers
returned when you do an insert.  So I put together a test and attached it.
My postgres version is 8.3.1 compiled from source.

My problem is that this:
test=> INSERT INTO ttt (a, b) VALUES ('5-5-08', 'test11212');
INSERT 0 0
Time: 21.646 ms
needs to show:
INSERT 0 1

or OpenJPA will not accept it.  The insert works, but OpenJPA does not
believe it and aborts the current transaction.

Is it possible to have partitioning and have insert show the right number of
rows inserted?

Thanks,

--Nik
--
-- PostgreSQL database dump
--

SET client_encoding = 'SQL_ASCII';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;

--
-- Name: partitions; Type: SCHEMA; Schema: -; Owner: cdr
--

CREATE SCHEMA partitions;


ALTER SCHEMA partitions OWNER TO cdr;

--
-- Name: plpgsql; Type: PROCEDURAL LANGUAGE; Schema: -; Owner: cdr
--

CREATE PROCEDURAL LANGUAGE plpgsql;


ALTER PROCEDURAL LANGUAGE plpgsql OWNER TO cdr;

SET search_path = public, pg_catalog;

SET default_tablespace = '';

SET default_with_oids = false;

--
-- Name: ttt; Type: TABLE; Schema: public; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt (
id integer NOT NULL,
a timestamp without time zone,
b character varying
);


ALTER TABLE public.ttt OWNER TO cdr;

--
-- Name: ttt_id_seq; Type: SEQUENCE; Schema: public; Owner: cdr
--

CREATE SEQUENCE ttt_id_seq
INCREMENT BY 1
NO MAXVALUE
NO MINVALUE
CACHE 1;


ALTER TABLE public.ttt_id_seq OWNER TO cdr;

--
-- Name: ttt_id_seq; Type: SEQUENCE OWNED BY; Schema: public; Owner: cdr
--

ALTER SEQUENCE ttt_id_seq OWNED BY ttt.id;


--
-- Name: ttt_id_seq; Type: SEQUENCE SET; Schema: public; Owner: cdr
--

SELECT pg_catalog.setval('ttt_id_seq', 17, true);


--
-- Name: id; Type: DEFAULT; Schema: public; Owner: cdr
--

ALTER TABLE ttt ALTER COLUMN id SET DEFAULT nextval('ttt_id_seq'::regclass);


SET search_path = partitions, pg_catalog;

--
-- Name: ttt_050508; Type: TABLE; Schema: partitions; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt_050508 (
id integer DEFAULT nextval('public.ttt_id_seq'::regclass),
CONSTRAINT ttt_050508_a_check CHECK (((a >= '2008-05-05'::date) AND (a < 
'2008-05-06'::date)))
)
INHERITS (public.ttt);


ALTER TABLE partitions.ttt_050508 OWNER TO cdr;

--
-- Name: ttt_050608; Type: TABLE; Schema: partitions; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt_050608 (
id integer DEFAULT nextval('public.ttt_id_seq'::regclass),
CONSTRAINT ttt_050608_a_check CHECK (((a >= '2008-05-06'::date) AND (a < 
'2008-05-07'::date)))
)
INHERITS (public.ttt);


ALTER TABLE partitions.ttt_050608 OWNER TO cdr;

--
-- Name: ttt_050708; Type: TABLE; Schema: partitions; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt_050708 (
id integer DEFAULT nextval('public.ttt_id_seq'::regclass),
CONSTRAINT ttt_050708_a_check CHECK (((a >= '2008-05-07'::date) AND (a < 
'2008-05-08'::date)))
)
INHERITS (public.ttt);


ALTER TABLE partitions.ttt_050708 OWNER TO cdr;

--
-- Name: ttt_050808; Type: TABLE; Schema: partitions; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt_050808 (
id integer DEFAULT nextval('public.ttt_id_seq'::regclass),
CONSTRAINT ttt_050808_a_check CHECK (((a >= '2008-05-08'::date) AND (a < 
'2008-05-09'::date)))
)
INHERITS (public.ttt);


ALTER TABLE partitions.ttt_050808 OWNER TO cdr;

--
-- Name: ttt_050908; Type: TABLE; Schema: partitions; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt_050908 (
id integer DEFAULT nextval('public.ttt_id_seq'::regclass),
CONSTRAINT ttt_050908_a_check CHECK (((a >= '2008-05-09'::date) AND (a < 
'2008-05-10'::date)))
)
INHERITS (public.ttt);


ALTER TABLE partitions.ttt_050908 OWNER TO cdr;

--
-- Name: ttt_051008; Type: TABLE; Schema: partitions; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt_051008 (
id integer DEFAULT nextval('public.ttt_id_seq'::regclass),
CONSTRAINT ttt_051008_a_check CHECK (((a >= '2008-05-10'::date) AND (a < 
'2008-05-11'::date)))
)
INHERITS (public.ttt);


ALTER TABLE partitions.ttt_051008 OWNER TO cdr;

--
-- Name: ttt_051108; Type: TABLE; Schema: partitions; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt_051108 (
id integer DEFAULT nextval('public.ttt_id_seq'::regclass),
CONSTRAINT ttt_051108_a_check CHECK (((a >= '2008-05-11'::date) AND (a < 
'2008-05-12'::date)))
)
INHERITS (public.ttt);


ALTER TABLE partitions.ttt_051108 OWNER TO cdr;

--
-- Name: ttt_051208; Type: TABLE; Schema: partitions; Owner: cdr; Tablespace: 
--

CREATE TABLE ttt_051208 (
id integer DEFAULT nextval('public.ttt_id_seq'::regclass),
CONSTRAINT ttt_051208_a_check CHECK (((a >= '2008-05-12'::date) AND (a < 
'2008-05-13'::date)))
)
INHERITS (public.ttt);


AL

Re: [PERFORM] Partitioning: INSERT 0 0 but want INSERT 0 1

2008-05-12 Thread Neil Peter Braggio
I have the same problem in PG 8.2

To resolve this issue I had to create a new table with the same
structure than the partitioned table with a trigger for insert and
update. All the operations the application have to do are directed to
this new table.

When a new record is inserted in the new table the trigger insert a
new record with the same values into the partitioned table and then
delete all records from this new table. In updates operations the
trigger redirect the operation to the partitioned table too.

With this _not elegant_ solution our Java application is able to do its job.

If you find a better solution please let me know.


Neil Peter Braggio
[EMAIL PROTECTED]


On Tue, May 13, 2008 at 11:48 AM, Nikolas Everett <[EMAIL PROTECTED]> wrote:
> I figure this subject belongs on the performance mailing list because it is
> about partitioning, which is a performance issue.
>
> I'm working on partitioning some of the tables used by an application that
> uses OpenJPA.  It turns out that OpenJPA is sensitive to the numbers
> returned when you do an insert.  So I put together a test and attached it.
> My postgres version is 8.3.1 compiled from source.
>
> My problem is that this:
> test=> INSERT INTO ttt (a, b) VALUES ('5-5-08', 'test11212');
> INSERT 0 0
> Time: 21.646 ms
> needs to show:
> INSERT 0 1
>
> or OpenJPA will not accept it.  The insert works, but OpenJPA does not
> believe it and aborts the current transaction.
>
> Is it possible to have partitioning and have insert show the right number of
> rows inserted?
>
> Thanks,
>
> --Nik
>
>
>  --
>  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>  To make changes to your subscription:
>  http://www.postgresql.org/mailpref/pgsql-performance
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] RAID controllers for Postgresql on large setups

2008-05-12 Thread Francisco Reyes

Inheritted a number of servers and I am starting to look into the hardware.

So far what I know from a few of the servers
Redhat servers.
15K rpm disks, 12GB to 32GB of RAM.
Adaptec 2120 SCSI controller (64MB of cache).

The servers have mostly have 12 drives in RAID 10.
We are going to redo one machine to compare  RAID 10 vs RAID 50. 
Mostly to see if the perfomance is close, the space gain may be usefull.


The usage pattern is mostly large set of transactions ie bulk loads of 
millions of rows, queries involving tens of millions of rows. There are 
usually only a handfull of connections at once, but I have seen it go up to 
10 in the few weeks I have been at the new job. The rows are not very wide. 
Mostly 30 to 90 bytes. The few that will be wider will be summary tables 
that will be read straight up without joins and indexed on the fields we 
will be quering them. Most of the connections will all be doing bulk 
reads/updates/writes.


Some of the larger tables have nearly 1 billion rows and most have tens of 
millions. Most DBs are under 500GB, since they had split the data as to keep 
each machine somewhat evenly balanced compared to the others.


I noticed the machine we are about to redo doesn't have a BBU.

A few questions.
Will it pay to go to a controller with higher memory for existing machines? 
The one machine I am about to redo has PCI which seems to 
somewhat limit our options. So far I have found another Adaptec controller,  
2130SLP, that has 128MB and is also just plain PCI. I need to decide whether 
to buy the BBU for the 2120 or get a new controller with more memory and a 
BBU. For DBs with bulk updates/inserts is 128MB write cache even enough to 
achieve reasonable rates? (ie at least 5K inserts/sec) 


A broader question
For large setups (ie 500GB+ per server) does it make sense to try to get a 
controller in a machine or do SANs have better throughput even if at a much 
higher cost?


For future machines I plan to look into controllers with at least 512MB, 
which likely will be PCI-X/PCI-e.. not seen anything with large caches for 
PCI. Also the machines in question have SCSI drives, not SAS. I believe the 
most recent machine has SAS, but the others may be 15K rpm scsi 

Whether a SAN or just an external enclosure is 12disk enough to substain 5K 
inserts/updates per second on rows in the 30 to 90bytes territory? At 
5K/second inserting/updating 100 Million records would take 5.5 hours. That 
is fairly reasonable if we can achieve. Faster would be better, but it 
depends on what it would cost to achieve.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID controllers for Postgresql on large setups

2008-05-12 Thread Joshua D. Drake
On Mon, 12 May 2008 22:04:03 -0400
Francisco Reyes <[EMAIL PROTECTED]> wrote:

> Inheritted a number of servers and I am starting to look into the
> hardware.
> 
> So far what I know from a few of the servers
> Redhat servers.
> 15K rpm disks, 12GB to 32GB of RAM.
> Adaptec 2120 SCSI controller (64MB of cache).
> 
> The servers have mostly have 12 drives in RAID 10.
> We are going to redo one machine to compare  RAID 10 vs RAID 50. 
> Mostly to see if the perfomance is close, the space gain may be
> usefull.

Most likely you have a scsi onboard as well I am guessing. You
shouldn't bother with the 2120. My tests show it is a horrible
controller for random writes.

Comparing software raid on an LSI onboard for an IBM 345 versus a 2120s
using hardware raid 10, the software raid completely blew the adaptec
away.

Joshua D. Drake
-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




signature.asc
Description: PGP signature


Re: [PERFORM] RAID controllers for Postgresql on large setups

2008-05-12 Thread Francisco Reyes

Joshua D. Drake writes:


Most likely you have a scsi onboard as well I am guessing.


Will check.



shouldn't bother with the 2120. My tests show it is a horrible
controller for random writes.


Thanks for the feedback..


Comparing software raid on an LSI onboard for an IBM 345 versus a 2120s
using hardware raid 10, the software raid completely blew the adaptec
away.


Any PCI controller you have had good experience with?
How any other PCI-X/PCI-e controller that you have had good results?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitioning: INSERT 0 0 but want INSERT 0 1

2008-05-12 Thread Nikolas Everett
If I can't find an answer in the next day or so I'll crack open OpenJPA and
disable that check.  Its a very simple, if ugly, hack.

--Nik


On 5/12/08, Neil Peter Braggio <[EMAIL PROTECTED]> wrote:
>
> I have the same problem in PG 8.2
>
> To resolve this issue I had to create a new table with the same
> structure than the partitioned table with a trigger for insert and
> update. All the operations the application have to do are directed to
> this new table.
>
> When a new record is inserted in the new table the trigger insert a
> new record with the same values into the partitioned table and then
> delete all records from this new table. In updates operations the
> trigger redirect the operation to the partitioned table too.
>
> With this _not elegant_ solution our Java application is able to do its
> job.
>
> If you find a better solution please let me know.
>
> 
> Neil Peter Braggio
> [EMAIL PROTECTED]
>
>
> On Tue, May 13, 2008 at 11:48 AM, Nikolas Everett <[EMAIL PROTECTED]>
> wrote:
> > I figure this subject belongs on the performance mailing list because it
> is
> > about partitioning, which is a performance issue.
> >
> > I'm working on partitioning some of the tables used by an application
> that
> > uses OpenJPA.  It turns out that OpenJPA is sensitive to the numbers
> > returned when you do an insert.  So I put together a test and attached
> it.
> > My postgres version is 8.3.1 compiled from source.
> >
> > My problem is that this:
> > test=> INSERT INTO ttt (a, b) VALUES ('5-5-08', 'test11212');
> > INSERT 0 0
> > Time: 21.646 ms
> > needs to show:
> > INSERT 0 1
> >
> > or OpenJPA will not accept it.  The insert works, but OpenJPA does not
> > believe it and aborts the current transaction.
> >
> > Is it possible to have partitioning and have insert show the right
> number of
> > rows inserted?
> >
> > Thanks,
> >
> > --Nik
> >
> >
> >  --
> >  Sent via pgsql-performance mailing list (
> pgsql-performance@postgresql.org)
> >  To make changes to your subscription:
> >  http://www.postgresql.org/mailpref/pgsql-performance
> >
> >
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] RAID controllers for Postgresql on large setups

2008-05-12 Thread Chris Ruprecht

Joshua,

did you try to run the 345 on an IBM ServeRAID 6i?
I have one in mine, but I never actually ran any speed test.
Do you have any benchmarks that I could run and compare?

best regards,
chris
--  
chris ruprecht

database grunt and bit pusher extraordinaíre


On May 12, 2008, at 22:11, Joshua D. Drake wrote:


On Mon, 12 May 2008 22:04:03 -0400
Francisco Reyes <[EMAIL PROTECTED]> wrote:


Inheritted a number of servers and I am starting to look into the



[snip]

Comparing software raid on an LSI onboard for an IBM 345 versus a  
2120s

using hardware raid 10, the software raid completely blew the adaptec
away.


[more snip]
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID controllers for Postgresql on large setups

2008-05-12 Thread Joshua D. Drake

Chris Ruprecht wrote:

Joshua,

did you try to run the 345 on an IBM ServeRAID 6i?


No the only controllers I had at the time were the 2120 and the LSI on 
board that is limited to RAID 1. I put the drives on the LSI in JBOD and 
used Linux software raid.


The key identifier for me was using a single writer over 6 (RAID 10) 
drives with the 2120 I could get ~ 16 megs a second. The moment I went 
to multiple writers it dropped exponentially.


However with software raid I was able to sustain ~ 16 megs a second over 
multiple threads. I stopped testing at 4 threads when I was getting 16 
megs per thread :). I was happy at that point.



Joshua D. Drake



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID controllers for Postgresql on large setups

2008-05-12 Thread Joshua D. Drake

Francisco Reyes wrote:

Joshua D. Drake writes:


Any PCI controller you have had good experience with?


I don't have any PCI test data.


How any other PCI-X/PCI-e controller that you have had good results?


http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/

If you are digging for used see if you can pick up a 64xx series from 
HP. A very nice card that can generally be had for reasonable dollars.


http://cgi.ebay.com/HP-Compaq-SMART-ARRAY-6402-CTRL-128MB-SCSI-273915-B21_W0QQitemZ120259020765QQihZ002QQcategoryZ11182QQssPageNameZWDVWQQrdZ1QQcmdZViewItem

If you want new, definitely go with the P800.

Sincerely,

Joshua D. Drake


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID controllers for Postgresql on large setups

2008-05-12 Thread Greg Smith

On Mon, 12 May 2008, Francisco Reyes wrote:

We are going to redo one machine to compare  RAID 10 vs RAID 50. Mostly to 
see if the perfomance is close, the space gain may be usefull.


Good luck with that, you'll need it.

Will it pay to go to a controller with higher memory for existing 
machines? The one machine I am about to redo has PCI which seems to 
somewhat limit our options. So far I have found another Adaptec 
controller, 2130SLP, that has 128MB and is also just plain PCI. I need 
to decide whether to buy the BBU for the 2120 or get a new controller 
with more memory and a BBU.


These options are both pretty miserable.  I hear rumors that Adaptec makes 
controllers that work OK under Linux , I've never seen one.  A quick 
search suggests both the 2120 and 2130SLP are pretty bad.  The suggestions 
Joshua already gave look like much better ideas.


Considering your goals here, I personally wouldn't put a penny into a 
system that wasn't pretty modern.  I think you've got too aggressive a 
target for database size combined with commit rate to be playing with 
hardware unless it's new enough to support PCI-Express cards.


For DBs with bulk updates/inserts is 128MB write cache even enough to 
achieve reasonable rates? (ie at least 5K inserts/sec)


This really depends on how far the data is spread across disk.  You'll 
probably be OK on inserts.  Let's make a wild guess and say we fit 80 
100-byte records in each 8K database block.  If you have 5000/second, 
that's 63 8K blocks/second which works out to 0.5MB/s of writes.  Pretty 
easy, unless there's a lot of indexes involved as well.  But an update can 
require reading in a 8K block, modifying it, then writing another back out 
again.  In the worst case, if your data was sparse enough (which is 
frighteningly possible when I hear you mention a billion records) that 
every update was hitting a unique block, 5K/sec * 8K = 39MB/second of 
reads *and* writes.  That doesn't sound like horribly much, but that's 
pretty tough if there's a lot of seeking involved in there.


Now, in reality, many of your small records will be clumped into each 
block on these updates and a lot of writes are deferred until checkpoint 
time which gives more time to aggregate across shared blocks.  You'll 
actually be somewhere in the middle of 0.5 and 78MB/s, which is a pretty 
wide range.  It's hard to estimate too closely here without a lot more 
information about the database, the application, what version of 
PostgreSQL you're using, all sorts of info.


You really should be thinking in terms of benchmarking the current 
hardware first to try and draw some estimates you can extrapolate from. 
Theoretical comments are a very weak substitute for real-world 
benchmarking on the application itself, even if that benchmarking is done 
on less capable hardware.  Run some tests, measure your update rate while 
also measuring real I/O rate with vmstat, compare that I/O rate to the 
disk's sequential/random performance as measured via bonnie++, and now 
there's a set of figures that mean something you can estimate based on.


For large setups (ie 500GB+ per server) does it make sense to try to get a 
controller in a machine or do SANs have better throughput even if at a much 
higher cost?


That's not a large setup nowadays, certainly not large enough that a SAN 
would be required to get reasonable performance.  You may need an array 
that's external to the server itself, but a SAN includes more than just 
that.


There are a lot of arguments on both sides for using SANs; see 
http://wiki.postgresql.org/wiki/Direct_Storage_vs._SAN for a summary and 
link to recent discussion where this was thrashed about heavily.  If 
you're still considering RAID5 and PCI controllers you're still a bit in 
denial about the needs of your situation here, but jumping right from 
there to assuming you need a SAN is likely overkill.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Installation Steps to migrate to Postgres 8.3.1

2008-05-12 Thread Gauri Kanekar
Hi,

We want to migrate from postgres 8.1.3 to postgres 8.3.1.
Can anybody list out the installation steps to be followed for migration.
Do we require to take care of something specially.

Thanks in advance
~ Gauri


Re: [PERFORM] Installation Steps to migrate to Postgres 8.3.1

2008-05-12 Thread Claus Guttesen
> We want to migrate from postgres 8.1.3 to postgres 8.3.1.
> Can anybody list out the installation steps to be followed for migration.
> Do we require to take care of something specially.

Perform a pg_dump, do a restore and validate your sql-queries on a test-server.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance