On Sat, Oct 31, 2020 at 2:07 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > Hi, > > I've done a bit more testing today, and I think the parsing is busted in > some way. Consider this: > > test=# create extension random; > CREATE EXTENSION > > test=# create table t (a text); > CREATE TABLE > > test=# insert into t select random_string(random_int(10, 256*1024)) from generate_series(1,10000); > INSERT 0 10000 > > test=# copy t to '/mnt/data/t.csv'; > COPY 10000 > > test=# truncate t; > TRUNCATE TABLE > > test=# copy t from '/mnt/data/t.csv'; > COPY 10000 > > test=# truncate t; > TRUNCATE TABLE > > test=# copy t from '/mnt/data/t.csv' with (parallel 2); > ERROR: invalid byte sequence for encoding "UTF8": 0x00 > CONTEXT: COPY t, line 485: "m&\nh%_a"%r]>qtCl:Q5ltvF~;2oS6@HB >F>og,bD$Lw'nZY\tYl#BH\t{(j~ryoZ08"SGU~.}8CcTRk1\ts$@U3szCC+U1U3i@P..." > parallel worker > > > The functions come from an extension I use to generate random data, I've > pushed it to github [1]. The random_string() generates a random string > with ASCII characters, symbols and a couple special characters (\r\n\t). > The intent was to try loading data where a fields may span multiple 64kB > blocks and may contain newlines etc. > > The non-parallel copy works fine, the parallel one fails. I haven't > investigated the details, but I guess it gets confused about where a > string starts/end, or something like that. >
Thanks for identifying this issue, this issue is fixed in v10 patch posted at [1] [1] https://www.postgresql.org/message-id/CALDaNm05FnA-ePvYV_t2%2BWE_tXJymbfPwnm%2Bkc9y1iMkR%2BNbUg%40mail.gmail.com Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com