Re: [GENERAL] PostgreSQL Write Performance

Alban Hertroys Tue, 05 Jan 2010 03:04:43 -0800

On 5 Jan 2010, at 8:30, Yan Cheng Cheok wrote:

>>> What is the actual problem you are trying to solve?
> 
> I am currently developing a database system for a high speed measurement 
> machine.
> 
> The time taken to perform measurement per unit is in term of ~30 
> milliseconds. We need to record down the measurement result for every single 
> unit. Hence, the time taken by record down the measurement result shall be 
> far more less than milliseconds, so that it will have nearly 0 impact on the 
> machine speed (If not, machine need to wait for database to finish writing, 
> before performing measurement on next unit)
> 
> Previously, we are using flat file.. However, using flat file is quite a 
> mess, when come to generating reports to customers.



That flat file can help you with clustering INSERTs together and also means 
you'll have all your measurements ready for a single INSERT. The latter is 
useful if you're planning on using arrays to store your measurements, as 
updating array values requires the entire array to be rewritten to the 
database. I don't know how your measurements would arrive without the flat 
file, but I wouldn't be surprised if the measurements for a single unit would 
come out at different points in time, which would be a bit painful with arrays 
(not quite as much with a measurements table though).

A safe approach (with minimal risk of data loss) would be to split your flat 
file every n units (earlier in this thread a number of n=1000 was mentioned) 
and store that data using COPY in the format COPY expects. You will probably 
also want to keep a queue-table (which is just a normal table, but it's used 
like a queue) with the names of the flat files that need processing.

I haven't done this kind of thing before, but I envision it something like this:

CREATE TABLE unit (
        id              bigserial       NOT NULL,
        date            date            NOT NULL DEFAULT CURRENT_DATE,
        measured        text[],
        measurements    numeric(4,3)[]
);

CREATE TABLE queue (
        file            text            NOT NULL,
        definitive      boolean         DEFAULT False
);

---file-2010-01-05-00000001---
/* Update in it's own transaction so that we know we tried to process this file
 * even if the transaction rolls back.
 */
UPDATE queue SET definitive = True
 WHERE file = 'file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00000001';

/* Start work */
BEGIN;
COPY unit FROM STDIN;
1       {Width,Height}  {0.001,0.021}
2       {Width,Height}  {0.002,0.019}
...
999     {Width,Height}  {0.000,0.018}
\.

/* This file was processed and can be removed from the queue */
DELETE FROM queue WHERE file='file-2010-01-05-00000001';
COMMIT;

/* This will probably be the name of the next flat file, but we don't know that
 * for sure yet. It needs to be outside the transaction as otherwise 
CURRENT_DATE
 * will have the date of the start of the transaction and we need to know what 
the
 * next batch will be regardless of whether this one succeeded.
 */
INSERT INTO queue (file, definitive)
VALUES ('file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00001000', False);
---end of file---

You'd need a little program (a script will probably work) to read that queue 
table and send the commands in those files to the database. Don't forget that 
at the start the queue table will be empty ;) I recall some of this lists' 
members wrote up a webpage about how to implement queue-tables reliably.

Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.


!DSPAM:737,4b431caa10731320433375!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL Write Performance

Reply via email to