On Wed, Apr 15, 2020 at 7:15 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > As I understand this, it needs to parse the lines twice (second time > in phase-3) and till the first two phases are over, we can't start the > tuple processing work which is done in phase-3. So even if the > tokenization is done a bit faster but we will lose some on processing > the tuples which might not be an overall win and in fact, it can be > worse as compared to the single reader approach being discussed. > Now, if the work done in tokenization is a major (or significant) > portion of the copy then thinking of such a technique might be useful > but that is not the case as seen in the data shared above (the > tokenize time is very less as compared to data processing time) in > this email.
It seems to me that a good first step here might be to forget about parallelism for a minute and just write a patch to make the line splitting as fast as possible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company