Re: [PERFORM] Load experimentation

2009-12-10 Thread Scott Carey
On 12/10/09 3:29 PM, "Scott Carey" wrote: > On 12/7/09 11:12 AM, "Ben Brehmer" wrote: > >> Thanks for the quick responses. I will respond to all questions in one email: >> >> COPY command: Unfortunately I'm stuck with INSERTS due to the nature this >> data >> was generated (Hadoop/MapReduce

Re: [PERFORM] Load experimentation

2009-12-10 Thread Scott Carey
On 12/7/09 11:12 AM, "Ben Brehmer" wrote: > Thanks for the quick responses. I will respond to all questions in one email: > > COPY command: Unfortunately I'm stuck with INSERTS due to the nature this data > was generated (Hadoop/MapReduce). If you have control over the MapReduce output, you can

Re: [PERFORM] Load experimentation

2009-12-10 Thread Ben Brehmer
Hi Andy, Load is chugging along. We've optimized our postgres conf as much as possible but are seeing the inevitable I/O bottleneck. I had the same thought as you (converting inserts into copy's) a while back but unfortunately each file has many inserts into many different tables. Potentially

Re: [PERFORM] Load experimentation

2009-12-09 Thread Andy Colson
On 12/07/2009 12:12 PM, Ben Brehmer wrote: Hello All, I'm in the process of loading a massive amount of data (500 GB). After some initial timings, I'm looking at 260 hours to load the entire 500GB. 10 days seems like an awfully long time so I'm searching for ways to speed this up. The load is ha

Re: [PERFORM] Load experimentation

2009-12-08 Thread Dimitri Fontaine
Scott Marlowe writes: > That's a lot of work to get to COPY. Well, yes. I though about it this way only after having read that OP is uneasy with producing another format from his source data, and considering it's a one-shot operation. Ah, tradeoffs, how to find the right one! -- dim -- Sent

Re: [PERFORM] Load experimentation

2009-12-08 Thread Scott Marlowe
On Tue, Dec 8, 2009 at 2:08 AM, Dimitri Fontaine wrote: > Hi, > > Ben Brehmer writes: >> By "Loading data" I am implying: "psql -U postgres -d somedatabase -f >> sql_file.sql".  The sql_file.sql contains table creates and insert >> statements. There are no >> indexes present nor created during

Re: [PERFORM] Load experimentation

2009-12-08 Thread Dimitri Fontaine
Hi, Ben Brehmer writes: > By "Loading data" I am implying: "psql -U postgres -d somedatabase -f > sql_file.sql". The sql_file.sql contains table creates and insert > statements. There are no > indexes present nor created during the load. > > OS: x86_64-unknown-linux-gnu, compiled by GCC gcc (G

Re: [PERFORM] Load experimentation

2009-12-08 Thread Scott Marlowe
On Tue, Dec 8, 2009 at 12:58 AM, Scott Marlowe wrote: > On Tue, Dec 8, 2009 at 12:22 AM, Ben Brehmer wrote: >> Thanks for all the responses. I have one more thought; >> >> Since my input data is split into about 200 files (3GB each), I could >> potentially spawn one load command for each file. Wh

Re: [PERFORM] Load experimentation

2009-12-07 Thread Scott Marlowe
On Tue, Dec 8, 2009 at 12:22 AM, Ben Brehmer wrote: > Thanks for all the responses. I have one more thought; > > Since my input data is split into about 200 files (3GB each), I could > potentially spawn one load command for each file. What would be the maximum > number of input connections Postgre

Re: [PERFORM] Load experimentation

2009-12-07 Thread Greg Smith
Ben Brehmer wrote: Since my input data is split into about 200 files (3GB each), I could potentially spawn one load command for each file. What would be the maximum number of input connections Postgres can handle without bogging down? You can expect to easily get one loader process per real CP

Re: [PERFORM] Load experimentation

2009-12-07 Thread Ben Brehmer
Thanks for all the responses. I have one more thought; Since my input data is split into about 200 files (3GB each), I could potentially spawn one load command for each file. What would be the maximum number of input connections Postgres can handle without bogging down? When I say 'input conne

Re: [PERFORM] Load experimentation

2009-12-07 Thread Greg Smith
Ben Brehmer wrote: By "Loading data" I am implying: "psql -U postgres -d somedatabase -f sql_file.sql". The sql_file.sql contains table creates and insert statements. There are no indexes present nor created during the load. COPY command: Unfortunately I'm stuck with INSERTS due to the nature

Re: [PERFORM] Load experimentation

2009-12-07 Thread Alan Hodgson
On Monday 07 December 2009, Ben Brehmer wrote: > Disk Setup: Using a single disk Amazon image for the destination > (database). Source is coming from an EBS volume. I didn't think there > were any disk options in Amazon? I don't think any Amazon cloud service is particularly well suited to a dat

Re: [PERFORM] Load experimentation

2009-12-07 Thread Craig James
Ben Brehmer wrote: Thanks for the quick responses. I will respond to all questions in one email: By "Loading data" I am implying: "psql -U postgres -d somedatabase -f sql_file.sql". The sql_file.sql contains table creates and insert statements. There are no indexes present nor created during

Re: [PERFORM] Load experimentation

2009-12-07 Thread Ben Brehmer
Thanks for the quick responses. I will respond to all questions in one email: By "Loading data" I am implying: "psql -U postgres -d somedatabase -f sql_file.sql". The sql_file.sql contains table creates and insert statements. There are no indexes present nor created during the load. OS: x86

Re: [PERFORM] Load experimentation

2009-12-07 Thread Craig James
Ben Brehmer wrote: Hello All, I'm in the process of loading a massive amount of data (500 GB). After some initial timings, I'm looking at 260 hours to load the entire 500GB. You don't say how you are loading the data, so there's not much to go on. But generally, there are two primary ways t

Re: [PERFORM] Load experimentation

2009-12-07 Thread Ben Brehmer
Kevin, This is running on on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) Ben On 07/12/2009 10:33 AM, Kevin Grittner wrote: Ben Brehmer wrote: -7.5 GB memory -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) -64-bit platf

Re: [PERFORM] Load experimentation

2009-12-07 Thread Thom Brown
2009/12/7 Kevin Grittner > Ben Brehmer wrote: > > > -7.5 GB memory > > -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units > >each) > > -64-bit platform > > What OS? > > > (PostgreSQL 8.1.3) > > Why use such an antiquated, buggy version? Newer versions are > faster. > > -Kevin >

Re: [PERFORM] Load experimentation

2009-12-07 Thread Scott Mead
On Mon, Dec 7, 2009 at 1:12 PM, Ben Brehmer wrote: > Hello All, > > I'm in the process of loading a massive amount of data (500 GB). After some > initial timings, I'm looking at 260 hours to load the entire 500GB. 10 days > seems like an awfully long time so I'm searching for ways to speed this

Re: [PERFORM] Load experimentation

2009-12-07 Thread Kevin Grittner
Ben Brehmer wrote: > -7.5 GB memory > -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units >each) > -64-bit platform What OS? > (PostgreSQL 8.1.3) Why use such an antiquated, buggy version? Newer versions are faster. -Kevin -- Sent via pgsql-performance mailing list (pgs

[PERFORM] Load experimentation

2009-12-07 Thread Ben Brehmer
Hello All, I'm in the process of loading a massive amount of data (500 GB). After some initial timings, I'm looking at 260 hours to load the entire 500GB. 10 days seems like an awfully long time so I'm searching for ways to speed this up. The load is happening in the Amazon cloud (EC2), on a