Andrew Dunstan <[EMAIL PROTECTED]> writes: > The advantage of having it in COPY is that it can be done serverside direct > from the file system. For massive bulk loads that might be a plus, although I > don't know what the protocol+socket overhead is.
Actually even if you use client-side COPY it's *still* more efficient than any more general client-side alternative. As Tom pointed out to me a while back, neither the protocol nor libpq allow for having multiple queries in flight simultaneously. That makes it impossible to stream large quantities of data to the server efficiently. Each record requires a round-trip and context switch overhead. In an ideal world the client should be able to queue up enough records to fill the socket buffers and allow the kernel to switch to a more batch oriented context switch mode where the server can process large numbers of records before switching back to the client. Ideally this would apply to any kind of query execution. But as a kind of short cut towards this for bulk loading I'm curious whether it would be possible to adopt a sort of batch execution mode where a statement is prepared, then parameters to the statement are streamed to the server in a kind of COPY mode. It would have to be some format that allowed for embedded newlines though; there's just no point in an interface that can't handle arbitrary data. Personally I find the current CSV support inadequate. It seems pointless to support CSV if it can't load data exported from Excel, which seems like the main use case. But I always thought bulk loading should be from some external application anyways. The problem is that there isn't any interface suitable for an external application to use. -- greg ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html