Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Maciek/Vitalii- Thanks for the pointers to the JDBC work. Luckily, we had already found the COPY support in the pg driver, but were wondering if anyone had already written the complimentary unpacking code for the raw data returned from the copy. Again the spec is clear enough that we could

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Maciek Sakrejda
> JDBC driver has some COPY support, but I don't remember details. You'd > better ask in JDBC list. As long as we're here: yes, the JDBC driver has COPY support as of 8.4(?) via the CopyManager PostgreSQL-specific API. You can call ((PGConnection)conn).getCopyManager() and do either push- or pull-

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
mark- Thanks for all the good questions/insights. People are probably going to want more detail on the list to give alternate ways of attacking the problem. That said I am going to try and fill in some of the gaps where I can... The copy suggestion is a good one if you are unloading to

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Vitalii Tymchyshyn
04.11.10 16:31, Nick Matheson написав(ла): Heikki- Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY". Thanks for the suggestion. A preliminary test shows an improvement closer to our expected 35 MB/s. Are you familiar with any Java libraries for decoding the COPY format? The sp

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Pierre- Reading from the tables is very fast, what bites you is that postgres has to convert the data to wire format, send it to the client, and the client has to decode it and convert it to a format usable by your application. Writing a custom aggregate in C should be a lot faster since it h

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Andy- I have no idea if this would be helpful or not, never tried it, but when you fire off "select * from bigtable" pg will create the entire resultset in memory (and maybe swap?) and then send it all to the client in one big lump. You might try a cursor and fetch 100-1000 at a time from the

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Marti- Just some ideas that went through my mind when reading your post PostgreSQL 8.3 and later have 22 bytes of overhead per row, plus page-level overhead and internal fragmentation. You can't do anything about row overheads, but you can recompile the server with larger pages to reduce page ove

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Nick Matheson
Heikki- Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY". Thanks for the suggestion. A preliminary test shows an improvement closer to our expected 35 MB/s. Are you familiar with any Java libraries for decoding the COPY format? The spec is clear and we could clearly write our o

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Pierre C
Is there any way using stored procedures (maybe C code that calls SPI directly) or some other approach to get close to the expected 35 MB/s doing these bulk reads? Or is this the price we have to pay for using SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and found it to