Re: Loading 500m json files to database

Rob Sargent Mon, 23 Mar 2020 18:52:05 -0700

> On Mar 23, 2020, at 7:11 PM, David G. Johnston <david.g.johns...@gmail.com> 
> wrote:
> 
> On Mon, Mar 23, 2020 at 3:24 AM pinker <pin...@onet.eu 
> <mailto:pin...@onet.eu>> wrote:
> time for i in datafiles/*; do
>   psql -c "\copy json_parts(json_data) FROM $i"&
> done
> 
> Don't know whether this is faster but it does avoid spinning up a connection 
> multiple times.
> 
> #bash, linux
>     function append_each_split_file_to_etl_load_script() {
>         for filetoload in ./*; do
>             ronumber="$(basename $filetoload)"
>             # only process files since subdirs can be present
>             if [[ -f "$filetoload" ]]; then
>                 echo ""
>                 echo "\set invoice"' `cat '"'""$filetoload""'"'`'
>                 echo ", ('$ronumber',:'invoice')"
>             fi >> "$PSQLSCRIPT"
>         done
> 
>         echo ""  >> "$PSQLSCRIPT"
>         echo ";" >> "$PSQLSCRIPT"
>         echo ""  >> "$PSQLSCRIPT"
>     }
> 
> There is a bit other related code that is needed (for my specific usage) but 
> this is the core of it.  Use psql variables to capture the contents of each 
> file into a variable and then just perform a normal insert (specifically, a 
> VALUES (...), (...) variant).  Since you can intermix psql and SQL you 
> basically output a bloody long script, that has memory issues at scale - but 
> you can divide and conquer - and then "psql --file 
> bloody_long_script_part_1_of_100000.psql".
> 
> David J.
> 
Can one put 550M files in a single directory?  I thought it topped out at 16M 
or so.
Re: Loading 500m json files to database

Reply via email to