> On 6 Jul 2022, at 1:35 PM, Florents Tselai <florents.tse...@gmail.com> wrote:
> 
> 
> 
>> On 6 Jul 2022, at 1:11 PM, Francisco Olarte <fola...@peoplecall.com> wrote:
>> 
>> On Wed, 6 Jul 2022 at 11:55, Florents Tselai <florents.tse...@gmail.com> 
>> wrote:
>>> Also, fwiw looking at top the CPU% and MEM% activity, looks like it does 
>>> data crunching work.
>> ...
>>>>> On 06.07.22 10:42, Florents Tselai wrote:
>>>>>> I have a beefy server (40+ worker processes , 40GB+ shared buffers) and 
>>>>>> a table holding (key text, text text,) of around 50M rows.
>>>>>> These are text fields extracted from 4-5 page pdfs each.
>> 
>> How big is yout table? from your query it seems you expect more than
>> 1M-1 ( left... ), but if you have very big text columns it may be
>> spending a lot of time fully decompressing / reading them ( I'm not
>> sure if it left(..) on toasted values is optimized to stop after
>> reading enough ). Also, it has to rewrite a lot of data to insert the
>> columns, it it takes some ms per row which I would not discard 50M
>> rows * 1 ms / row = 50ksecs = 500k secs ~=13.9 hours per ms-row, so at
>> 2 ms ( which may be right for reading a big row, calculating the
>> vector and writing an even bigger row ) it would take more than a day
>> to finish, which I would not discard given you are asking for a heavy
>> thing.
> 
> 50M+ rows and iirc pg_relation_size was north of 80GB or so.
> 
>> 
>> If you have stopped it I would try doing a 1000 row sample in a copied
> 
> Haven’t stopped it as I’m not convinced       there’s an alternative to just 
> waiting
> For it to complete :/ 
> 
>> table to get an speed idea. Otherwise, with this query, I would
>> normally monitor disk usage of disk files as an indication of

Actually, I monitored my disk usage and it was **definitely** working as 
It had already eaten up an additional 30% of my disk capacity.

Thus, I’ll have to fall back on my initial solution and use GIN indexes 
To get ts_vectors on the fly.

>> progress, I'm not sure there is another thing you could look at
>> without disturbing it.
>> 
>> FWIW, I would consider high mem usage normal in these kind of query,
>> hi cpu would depend on what you call it, but it wouldn't surprise me
>> if it has at least one cpu running at full detoasting and doing
>> vectors, I do not know if alter table can go paralell..
>> 
> 
> You’re probably right, a lot of the CPU usage could be detoasting.
> 
>> Francisco Olarte.


Thanks everyone for your comments.
You can consider this solved. 

Reply via email to