On Thu, Apr 9, 2020 at 2:55 PM Andres Freund <and...@anarazel.de> wrote:
> I'm fairly certain that we do *not* want to distribute input data between 
> processes on a single tuple basis. Probably not even below a few hundred kb. 
> If there's any sort of natural clustering in the loaded data - extremely 
> common, think timestamps - splitting on a granular basis will make indexing 
> much more expensive. And have a lot more contention.

That's a fair point. I think the solution ought to be that once any
process starts finding line endings, it continues until it's grabbed
at least a certain amount of data for itself. Then it stops and lets
some other process grab a chunk of data.

Or are you are arguing that there should be only one process that's
allowed to find line endings for the entire duration of the load?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Reply via email to