Well the job is done. The talend component is working ( https://github.com/parisni/talend/tree/master/tPostgresqlOutputBulkAPHP). It allows creating a file (binary or csv) locally, and then use the COPY function with "FROM STDIN" that does not need to push the file on a remote database server.
I have made a little comparison test: column1: character varying column2: integer column3: boolean 10 000 000 tuples Type | Create file time | Bulk load time | Total Time | File size Binary | 11137 milliseconds | 21661 milliseconds | 32798 milliseconds | 250 MO CSV | 23226 milliseconds | 22192 milliseconds | 45418 milliseconds | 179 MO Binary format is definitely faster and safer - faster because writing binary is faster than text file. I guess the bulk load time bottleneck is the network, then this is equivalent for both format. It is two time faster to load a binary when the file is on the database server. - safer thanks to the format (each value is preceded by its lenght) more robust thant CSV and separators (that can be present in the text). Code has been based on : - https://github.com/uwescience/myria/blob/master/src/edu/washington/escience/myria/PostgresBinaryTupleWriter.java - https://github.com/bytefish/PgBulkInsert/tree/master/PgBulkInsert/src/main/de/bytefish/pgbulkinsert/pgsql/handlers Thanks, 2016-05-10 15:08 GMT+02:00 Cat <c...@zip.com.au>: > On Tue, May 10, 2016 at 03:00:55PM +0200, Nicolas Paris wrote: > > > The way I want is : > > > csv -> binary -> postgresql > > > > > > Is this just to be quicker or are you going to add some business logic > > > while converting CSV data? > > > As you mentioned ETL, I assume the second, as I don't think that > > > converting CSV to binary and then loading it to PostgreSQL will be more > > > convenient than loading directly from CSV... as quicker as it can be, > you > > > have anyway to load data from CSV. > > > > > > > Right, ETL process means huge business logic. > > get the data (csv or other) -> transform it -> produce a binary -> copy > > from binary from stdin > > > > Producing 100GO CSVs, is a waste of time. > > Ah. You need to fiddle with the data. Then you need to weigh the pros of > something agnostic to Postgres's internals to something that needs to be > aware of them. > > You will need to delve into the source code for data types more complex > than INTEGER, TEXT and BYTEA (which was the majority of my data when I > was just looking into it). > > -- > "A search of his car uncovered pornography, a homemade sex aid, women's > stockings and a Jack Russell terrier." > - > http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-1111118083480 >