Can you send a EXPLAIN result in both use cases? Pavel
2013/8/19 Robert Sosinski <rsosin...@ticketevolution.com> > At the moment, all guids are distinct, however before I zapped the > duplicates, there were 280 duplicates. > > Currently, there are over 2 million distinct guids. > > -Robert > > > On Mon, Aug 19, 2013 at 11:12 AM, Pavel Stehule > <pavel.steh...@gmail.com>wrote: > >> >> >> >> 2013/8/19 Robert Sosinski <rsosin...@ticketevolution.com> >> >>> Hi Pavel, >>> >>> What kind of example do you need? I cant give you the actual data I >>> have in the table, but I can give you an example query and the schema >>> attached below. From there, I would just put in 2 million rows worth 1.2 >>> Gigs of data. Average size of the the extended columns (using the >>> pg_column_size function) in bytes are: >>> >>> guid: 33 >>> name: 2.41 >>> currency: 4 >>> fields: 120.32 >>> >>> example query: >>> >>> -- find duplicate records using a guid >>> select guid, array_agg(id) from orders group by guid; >>> >> >> how much distinct guid is there, and how much duplicates >> >> ?? >> >> regards >> >> Pavel >> >> >> >>> >>> example schema: >>> Table "public.things" >>> >>> Column | Type | >>> Modifiers | Storage | Stats target | Description >>> >>> ------------+-----------------------------+-----------------------------------------------------+----------+--------------+------------- >>> id | integer | not null default >>> nextval('things_id_seq'::regclass) | plain | | >>> version | integer | not null >>> | plain | | >>> created_at | timestamp without time zone | not null >>> | plain | | >>> updated_at | timestamp without time zone | not null >>> | plain | | >>> foo_id | integer | not null >>> | plain | | >>> bar_id | integer | not null >>> | plain | | >>> baz_id | integer | not null >>> | plain | | >>> guid | character varying | not null >>> | extended | | >>> name | character varying | not null >>> | extended | | >>> price | numeric(12,2) | not null >>> | main | | >>> currency | character varying | not null >>> | extended | | >>> amount | integer | not null >>> | plain | | >>> the_date | date | not null >>> | plain | | >>> fields | hstore | >>> | extended | | >>> Indexes: >>> "things_pkey" PRIMARY KEY, btree (id) >>> "things_foo_id_idx" btree (foo_id) >>> "things_bar_id_idx" btree (bar_id) >>> "things_baz_id_idx" btree (baz_id) >>> "things_guid_uidx" UNIQUE, btree (guid) >>> "things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10, >>> '0'::text)) >>> "things_price_idx" btree (price) >>> >>> Foreign-key constraints: >>> "things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id) >>> "things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id) >>> "things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id) >>> Triggers: >>> timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW >>> EXECUTE PROCEDURE timestamps_tfun() >>> >>> Let me know if you need anything else. >>> >>> Thanks, >>> >>> >>> On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule >>> <pavel.steh...@gmail.com>wrote: >>> >>>> Hello >>>> >>>> please, can you send some example or test? >>>> >>>> Regards >>>> >>>> Pavel Stehule >>>> >>>> >>>> 2013/8/19 Robert Sosinski <rsosin...@ticketevolution.com> >>>> >>>>> When using array_agg on a large table, memory usage seems to spike up >>>>> until Postgres crashes with the following error: >>>>> >>>>> 2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection >>>>> because of crash of another server process >>>>> 2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has >>>>> commanded this server process to roll back the current transaction and >>>>> exit, because another server process exited abnormally and possibly >>>>> corrupted shared memory. >>>>> 2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be >>>>> able to reconnect to the database and repeat your command. >>>>> >>>>> I've definitely isolated it down to using array_agg, as when I changed >>>>> the query to use string_agg, it worked fine. I also tried using array_agg >>>>> on a few different queries, all yielding the same issue. Swapping in >>>>> string_agg fixed the issue once more. >>>>> >>>>> This particular table has over 2 million rows and is 1.2 Gigs, and >>>>> when I ran the query while viewing htop, the virtual size of the >>>>> Postgres process ballooned to 13.9G until crashing. >>>>> >>>>> The version of Postgres I am using is: PostgreSQL 9.2.4 on >>>>> x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) >>>>> 4.7.2, 64-bit >>>>> >>>>> Any help would be much appreciated, thanks! >>>>> >>>>> -Robert >>>>> >>>> >>>> >>> >> >