FW: [GENERAL] Out of swap space & memory

Kevin Bartz Mon, 09 Aug 2004 10:45:36 -0700


-----Original Message-----
From: Kevin Bartz [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 09, 2004 10:37 AM
To: '[EMAIL PROTECTED]'
Subject: RE: [GENERAL] Out of swap space & memory


Thanks for your reply, Mike! Theoretically, I should need only six of the
columns, but as a means of verifying integrity, I would like to de-dup using
all the columns. For instance, if there are two rows identical everywhere
but some column outside the six, I would like to know about it so I can
report back to the data provider. Maybe there's some other way to do this
kind of check?

Kevin

-----Original Message-----
From: mike g [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 06, 2004 9:07 PM
To: Kevin Bartz
Subject: RE: [GENERAL] Out of swap space & memory

hmmmm, Can you determine which are rows are duplicates by examining one
column or must you examine all the columns?


On Fri, 2004-08-06 at 22:56, Kevin Bartz wrote:
> Thanks for your reply, but how can I then solve the problem of duplicates?
> Using your example, if one duplicate lives between rows 1 and 1,000,000
and
> another between rows 1,000,001 and 2,000,000, de-duping them individually
> will result in the duplicate showing up twice.
> 
> Kevin
> 
> -----Original Message-----
> From: mike g [mailto:[EMAIL PROTECTED] 
> Sent: Friday, August 06, 2004 8:52 PM
> To: Kevin Bartz
> Subject: RE: [GENERAL] Out of swap space & memory
> 
> Can octanenights_raw be altered?  If so using the Sequence feature in
> postgres you could autopopulate that column with a number starting at 1
> on up to 36 million.  Then your select queries could be select X from
> octanenights_raw where sequence_column_name > 1 and sequence_column_name
> < 1000000 etc.
> 
> Mike
> 
> On Fri, 2004-08-06 at 22:40, Kevin Bartz wrote:
> > Well, all I'm doing right now is using psql from the command line. A bit
> > unintelligent, I know, but I want to make sure things will work
> > appropriately before I dive into the world of query editors.
> > 
> > Thanks for your suggestions. My data won't need to be updated regularly.
> In
> > this case, as in all others, loading each set of data is a one-shot
> process,
> > so I don't think I'll need to worry about truncating.
> > 
> > I can't think of any way to break the problem into steps, since the
> > duplicates may well be scattered throughout the table. If I split
> > octanenights into octanenights1 and octanenights2 and then de-dup each
> > individually, I would have to first be sure that octanenights1 does not
> > share a duplicate with octanenights2, or that duplicate would appear in
> the
> > UNION ALLed version. Maybe I'm missing something?
> > 
> > Thanks for your kind response.
> > 
> > Kevin
> > 
> > -----Original Message-----
> > From: mike g [mailto:[EMAIL PROTECTED] 
> > Sent: Friday, August 06, 2004 8:19 PM
> > To: Kevin Bartz
> > Cc: 'Manfred Koizar'; [EMAIL PROTECTED]
> > Subject: Re: [GENERAL] Out of swap space & memory
> > 
> > Ok,
> > 
> > This is a long shot but how are you executing your code?  In say a
> > pgadminIII sql window with the below entered line after line?
> > 
> > If so I believe it will be treated as one transaction.  With the default
> > settings postgres would have to keep track of everything done to be able
> > to rollback all the changes if it failed.  I would believe that would
> > force it to keep track of all 56 million rows combined in memory
> > (probably just the oid column - I am sure the other more experienced
> > postgresql wizards can verify) but still that can take a lot of
> > resources.
> > 
> > If by chance you are doing it one sweep try executing it in separate
> > steps so the commit can be executed.
> > 
> > Hopefully then you won't run out of resources then.
> > 
> > Are you doing a drop / create say everynight to update your data?  If so
> > perhaps using TRUNCATE octanenights might be more efficient. 
> > 
> > If you must drop a full table perhaps a vacuum should be done
> > afterwards???
> > 
> > Mike
> > 
> > On Fri, 2004-08-06 at 21:32, Kevin Bartz wrote:
> > > Mike, thanks so much for your reply. I'm sorry for not showing you my
> SQL.
> > I
> > > didn't show it because I couldn't manage to boil it down to something
> > > reproducible that everyone could try. But here's what it was:
> > > 
> > > drop table octanenights;
> > > CREATE TABLE octanenights (member_id varchar(100), campaign_id
> > varchar(100),
> > > catalog_type varchar(100), pushed int, delivered int, clicks int,
opened
> > > int, month varchar(100), type1 int, type2 int, type3 int, type4 int,
> type5
> > > int);
> > > 
> > > copy octanenights from
> > > '/home/kevin/octanenights/proc/uberfile/uberfile1.txt' null as '';
> > > copy octanenights from
> > > '/home/kevin/octanenights/proc/uberfile/uberfile2.txt' null as '';
> > > copy octanenights from
> > > '/home/kevin/octanenights/proc/uberfile/uberfile3.txt' null as '';
> > > 
> > > select * from octanenights limit 10;
> > > alter table octanenights rename to octanenights_raw;
> > > 
> > > -- de-dup the table
> > > select member_id, campaign_id, catalog_type, pushed, delivered,
clicks,
> > > opened, month, type1, type2, type3, type4, type5
> > > into octanenights
> > > from octanenights_raw
> > > group by member_id, campaign_id, catalog_type, pushed, delivered,
> clicks,
> > >          opened, month, type1, type2, type3, type4, type5;
> > > 
> > > Let me tell you a little about octanenights. It's a file of about
> > 36,000,000
> > > rows, each describing an e-mail sent. Unfortunately, there are
duplicate
> > > records scattered throughout the table, which I do not care about. One
> > might
> > > suggest that I could've used uniq from the command line for this, but
> the
> > > data were not sorted originally and the duplicate records may be
> scattered
> > > anywhere in the table. The objective in the final line is to de-dup
the
> > > table and place it into octanenights, leaving the original in
> > > octanenights_raw in case I ever need to refer back to it.
> > > 
> > > MS SQL Server, with as much RAM and less clock speed, de-dups the
table
> in
> > > about six minutes. The de-duped version has about 26,000,000 rows. The
> > final
> > > line is where Postgres gobbles up all my swap and RAM and then conks
out
> > > completely.
> > > 
> > > Am I doing something wrong? Maybe there was a better way to approach
> this
> > > problem? I'd be open to suggestions of any kind, since I'm still very,
> > very
> > > new to the world of optimizing Postgres.
> > > 
> > > Kevin
> > > 
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Manfred
Koizar
> > > Sent: Tuesday, August 03, 2004 3:04 AM
> > > To: Kevin Bartz
> > > Cc: [EMAIL PROTECTED]
> > > Subject: Re: [GENERAL] Out of swap space & memory
> > > 
> > > On Mon, 2 Aug 2004 20:08:45 -0700, "Kevin Bartz"
> > > <[EMAIL PROTECTED]> wrote:
> > > >is there any way I can run this query?
> > > 
> > > What query?  You didn't show us your SQL.
> > > 
> > > Servus
> > >  Manfred
> > > 
> > > ---------------------------(end of
broadcast)---------------------------
> > > TIP 3: if posting/reading through Usenet, please send an appropriate
> > >       subscribe-nomail command to [EMAIL PROTECTED] so that
your
> > >       message can get through to the mailing list cleanly
> > > 
> > > 
> > > ---------------------------(end of
broadcast)---------------------------
> > > TIP 9: the planner will ignore your desire to choose an index scan if
> your
> > >       joining column's datatypes do not match


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

FW: [GENERAL] Out of swap space & memory

Reply via email to