Ivan Shmakov <oneing...@gmail.com> writes: > Alexander Kuznetsov <a...@cpan.org> writes: […] > (Some wording fixes and suggestions.)
Thanks a lot! For some reasons the message got off the thread, I accidently found it while searching for another. Also lists.debian.org cannot find the original post, while GMANE shows it perfectly fine. Is it supposed to be like that? [...] >> ignored during the loading. For example, you can skip integrity checks for >> performance when you copy data from another database to PostgreSQL. On the >> other hand, you can enable constraint checks when loading unclean data. > > Are “constraint checks” different to “integrity checks” in the > above? Unless they are, it should rather be, e. g.: Integrity check does include constraint check but in this case they are kept separate. The authors emphasize the fact that you can perform constraint check with pg_bulkload for unclean data while having [expensive] database server integrity check turned off. >> PostgreSQL, but version 3.0 or later has some ETL features like input data >> validation and data transformation with filter functions. > > … but as of version 3.0 some ETL features… were added. > > And what's ETL, BTW? Enter-Transform-Load - a software development pattern which currently evolved into an industry. Used to be a nice girl by a keyboard, nowadays implemented with network clusters. >> In version 3.1, pg_bulkload can convert the load data into the binary file >> which can be used as an input file of pg_bulkload. If you check whether > > Perhaps: > > As of version 3.1, pg_bulkload can dump the preprocessed data into a > binary file, allowing for… This would not be entirely true. While pg_bulkload does allow to convert the data into binary file, it requires assistance of server-side components of the package. Which one may consider not pg_bulkload utility itself and this is certainly not simple dumping preprocessed data. > (Here, the purpose should be mentioned. Is this for improving > the performance of later multiple “bulkloads”, for instance?) I would say the reverse. Multiple `bulkload' instances perform conversion using multiple [satellite] servers, which may populate [network] storage. Later a "main" server could pick up preprocessed data chunks and quickly load them. To make use of pg_bulkload 3.1+ ability to convert the data into binary form it is currently required to create a rather specific setup. I would withhold the promises of better performance as people would expect "dump binary locally, then upload to the server" functionality. It may hardly be feasible if, say, the server and the client have different CPU types. A single server/single storage case is the worst for the binary conversion. The process will be constrained by the RAM/storage bandwidth and slowed down almost twice. >> the load time itself. Also in version 3.1, parallel loading works >> more effectively than before. > s/effectively/efficiently/. But the whole sentence makes little > sense, as the earlier versions weren't packaged for Debian. Good point, thanks! -- Sincerely yours, Alexander Kuznetsov -- To UNSUBSCRIBE, email to debian-wnpp-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/CA+3pxd6x8wTtaV4=ubyeb3p-tomfjkfn+wytcz+xh5w_wpw...@mail.gmail.com