Re: "two time periods with only an endpoint in common do not overlap" ???

2021-10-17 Thread Francisco Olarte
Ron:

On Fri, 15 Oct 2021 at 20:16, Ron  wrote:
> > so no overlap.
> I was afraid you were going to say that.  It's completely bizarre, but seems
> to be a "thing" in computer science.

Right half open intervals are normally choosed because they can fully
cover the real line without overlap. Full open and full closed can
not.

When you have timestamps, or float/doubles, you are trying to
represent a point on a (time) line, although their finite computer
representation is countable.

When you use dates / integers you can use any kind of intervales, as
they are countable, but it is easier if you use the same
representation.

When I do billing I select monthly CDRs in one system on a condicion
on a timestamp column, {setup >= '2021-05-01' and setup<'2021-06-01'}.
Another system as split date/time, and I select on that table using
the same limits, { fecha >= '2021-05-01' and fecha<'2021-06-01' }. I
could use a full closed interval, but half open is easier.

Also, if you cover the countable set of dates with half-open intervals
it has the nice property of having start-range(i)=end_range(i-1),
which is much easier to program.

And half open are easier to generate. If you want to generate 12
element ranges starting at 1 you can do something like

for i=1 to 100 step 12
print i, i+12

which nicely lets you see they are dozens, if you use closed you need
to "print i, i+11" or "print i, i+12-1".

In general it is a thing because it is easier.

FOS




Force re-compression with lz4

2021-10-17 Thread Florents Tselai
Hello,

I have a table storing mostly text data (40M+ rows) that has 
pg_total_relation_size ~670GB.
I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4 
compression.

I’ve altered the column to use the new lz4 compression, but that only applies 
to new rows.

What’s the recommended way of triggering the re-evaluation for pre-existing 
rows? 

I tried wrapping a function like the following, but apparently each old record 
retains the compression applied.
text_corpus=(SELECT t.text from ...);

delete from t where id=;

insert into t(id, text) values (id, text_corpus);

Fttb, I resorted to preparing an external shell script to execute against the 
db but that’s too slow as it moves data in&out the db.

Is there a smarter way to do this ?

Thanks,
Flo

Re: Force re-compression with lz4

2021-10-17 Thread Ron

On 10/17/21 10:12 AM, Florents Tselai wrote:

Hello,

I have a table storing mostly text data (40M+ rows) that has 
pg_total_relation_size ~670GB.
I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4 
compression.


I’ve altered the column to use the new lz4 compression, but that only 
applies to new rows.


What’s the recommended way of triggering the re-evaluation for 
pre-existing rows?


I tried wrapping a function like the following, but apparently each old 
record retains the compression applied.
text_corpus=(SELECT t.text from...); delete from t where id=; insert into 
t(id, text) values (id, text_corpus);


Because it's all in one transaction?

Fttb, I resorted to preparing an external shell script to execute against 
the db but that’s too slow as it moves data in&out the db.


Is there a smarter way to do this ?


Even with in-place compression, you've got to read the uncompressed data.

Does your shell script process one record at a time?  Maybe do ranges:
COPY (SELECT * FROM t WHERE id BETWEEN x AND y) TO '/some/file.csv';
DELETE FROM t WHERE id BETWEEN x AND y;
COPY t FROM '/some/file.csv';

--
Angular momentum makes the world go 'round.


Re: Force re-compression with lz4

2021-10-17 Thread Ron

On 10/17/21 11:36 AM, Ron wrote:

On 10/17/21 10:12 AM, Florents Tselai wrote:

Hello,

I have a table storing mostly text data (40M+ rows) that has 
pg_total_relation_size ~670GB.
I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4 
compression.


I’ve altered the column to use the new lz4 compression, but that only 
applies to new rows.


What’s the recommended way of triggering the re-evaluation for 
pre-existing rows?


I tried wrapping a function like the following, but apparently each old 
record retains the compression applied.
text_corpus=(SELECT t.text from...); delete from t where id=; insert into 
t(id, text) values (id, text_corpus);


Because it's all in one transaction?

Fttb, I resorted to preparing an external shell script to execute against 
the db but that’s too slow as it moves data in&out the db.


Is there a smarter way to do this ?


Even with in-place compression, you've got to read the uncompressed data.

Does your shell script process one record at a time?  Maybe do ranges:
COPY (SELECT * FROM t WHERE id BETWEEN x AND y) TO '/some/file.csv';
DELETE FROM t WHERE id BETWEEN x AND y;

I forgot to mention:
VACUUM t;

COPY t FROM '/some/file.csv';


--
Angular momentum makes the world go 'round.


Re: Force re-compression with lz4

2021-10-17 Thread Magnus Hagander
On Sun, Oct 17, 2021 at 5:12 PM Florents Tselai 
wrote:

> Hello,
>
> I have a table storing mostly text data (40M+ rows) that has
> pg_total_relation_size ~670GB.
> I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4
> compression.
>
> I’ve altered the column to use the new lz4 compression, but that only
> applies to new rows.
>
> What’s the recommended way of triggering the re-evaluation for
> pre-existing rows?
>
> I tried wrapping a function like the following, but apparently each old
> record retains the compression applied.
>
> text_corpus=(SELECT t.text from ...);
>
> delete from t where id=;
>
> insert into t(id, text) values (id, text_corpus);
>
>
> Fttb, I resorted to preparing an external shell script to execute against
> the db but that’s too slow as it moves data in&out the db.
>
> Is there a smarter way to do this ?
>
>
It should be enough to VACUUM FULL the table. (but it has to be VACUUM
FULL, not a regular vacuum). Or CLUSTER.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Force re-compression with lz4

2021-10-17 Thread Adrian Klaver

On 10/17/21 10:17, Magnus Hagander wrote:
On Sun, Oct 17, 2021 at 5:12 PM Florents Tselai 
mailto:florents.tse...@gmail.com>> wrote:




Is there a smarter way to do this ?


It should be enough to VACUUM FULL the table. (but it has to be VACUUM 
FULL, not a regular vacuum). Or CLUSTER.


With the proviso that this will require double the existing space, 
~670GB, until the operation is completed.




--
  Magnus Hagander
  Me: https://www.hagander.net/ 
  Work: https://www.redpill-linpro.com/ 



--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Force re-compression with lz4

2021-10-17 Thread Florents Tselai
Yes, That COPY-delete-COPY sequence is what I ended up doing.
Unfortunately can’t use ranges as the PK its a text string.

> On 17 Oct 2021, at 7:36 PM, Ron  wrote:
> 
> On 10/17/21 10:12 AM, Florents Tselai wrote:
>> Hello,
>> 
>> I have a table storing mostly text data (40M+ rows) that has 
>> pg_total_relation_size ~670GB.
>> I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4 
>> compression.
>> 
>> I’ve altered the column to use the new lz4 compression, but that only 
>> applies to new rows.
>> 
>> What’s the recommended way of triggering the re-evaluation for pre-existing 
>> rows? 
>> 
>> I tried wrapping a function like the following, but apparently each old 
>> record retains the compression applied.
>> text_corpus=(SELECT t.text from ...);
>> 
>> delete from t where id=;
>> 
>> insert into t(id, text) values (id, text_corpus);
> 
> Because it's all in one transaction?
> 
>> Fttb, I resorted to preparing an external shell script to execute against 
>> the db but that’s too slow as it moves data in&out the db.
>> 
>> Is there a smarter way to do this ?
> 
> Even with in-place compression, you've got to read the uncompressed data.
> 
> Does your shell script process one record at a time?  Maybe do ranges:
> COPY (SELECT * FROM t WHERE id BETWEEN x AND y) TO '/some/file.csv';
> DELETE FROM t WHERE id BETWEEN x AND y;
> COPY t FROM '/some/file.csv';
> 
> -- 
> Angular momentum makes the world go 'round.



Re: Force re-compression with lz4

2021-10-17 Thread Florents Tselai
I did look into VACUUM(full) for it’s PROCESS_TOAST option which makes sense, 
but the thing is I already had a cron-ed VACUUM (full) which I ended up 
disabling a while back; exactly because of the double-space requirement.
The DB has already a 1TB size and occupying another 600MB would require some 
hassle. Thus, the external script approach makes more sense.


> On 17 Oct 2021, at 8:28 PM, Adrian Klaver  wrote:
> 
> On 10/17/21 10:17, Magnus Hagander wrote:
>> On Sun, Oct 17, 2021 at 5:12 PM Florents Tselai > > wrote:
> 
>>Is there a smarter way to do this ?
>> It should be enough to VACUUM FULL the table. (but it has to be VACUUM FULL, 
>> not a regular vacuum). Or CLUSTER.
> 
> With the proviso that this will require double the existing space, ~670GB, 
> until the operation is completed.
> 
>> -- 
>>  Magnus Hagander
>>  Me: https://www.hagander.net/ 
>>  Work: https://www.redpill-linpro.com/ 
> 
> 
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com





Re: Force re-compression with lz4

2021-10-17 Thread Daniel Verite
Florents Tselai wrote:

> I have a table storing mostly text data (40M+ rows) that has
> pg_total_relation_size ~670GB.
> I’ve just upgraded to postgres 14 and I’m now eager to try the new LZ4
> compression.

You could start experimenting with data samples rather than the
full contents.

FWIW, in my case I've found that the compression ratio of lz4 was only
marginally better than pglz (like 2% on text).
As for decompression time, it doesn't seem to differ significantly
from pglz, so overall, recompressing existing data did not seem
worth the trouble.

However lz4 appears to be much faster to compress than pglz, so its
benefit is clear in terms of CPU usage for future insertions.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite




Re: Force re-compression with lz4

2021-10-17 Thread Michael Paquier
On Sun, Oct 17, 2021 at 10:33:52PM +0200, Daniel Verite wrote:
> However lz4 appears to be much faster to compress than pglz, so its
> benefit is clear in terms of CPU usage for future insertions.

CPU-speaking, LZ4 is *much* faster than pglz when it comes to
compression or decompression with its default options.  The
compression ratio is comparable between both, still LZ4 compresses in
average less than PGLZ.
--
Michael


signature.asc
Description: PGP signature


Re: Force re-compression with lz4

2021-10-17 Thread Michael Paquier
On Sun, Oct 17, 2021 at 10:13:48PM +0300, Florents Tselai wrote:
> I did look into VACUUM(full) for it’s PROCESS_TOAST option which
> makes sense, but the thing is I already had a cron-ed VACUUM (full)
> which I ended up disabling a while back; exactly because of the
> double-space requirement.

Please note that VACUUM FULL does not enforce a recompression on
existing values.  See commit dbab0c0, that disabled this choice as it
introduced a noticeable performance penality in some cases when
looking at the compression type of the vacuumed table attributes:
=# CREATE TABLE cmdata(f1 text COMPRESSION pglz);
CREATE TABLE
=# INSERT INTO cmdata VALUES(repeat('1234567890', 1000));
INSERT 0 1
=# SELECT pg_column_compression(f1) FROM cmdata;
 pg_column_compression
---
  pglz
(1 row)
=# ALTER TABLE cmdata ALTER COLUMN f1 SET COMPRESSION lz4;
ALTER TABLE
=# VACUUM FULL cmdata;
VACUUM
=# SELECT pg_column_compression(f1) FROM cmdata;
 pg_column_compression
---
  pglz
(1 row)
--
Michael


signature.asc
Description: PGP signature


Re: Force re-compression with lz4

2021-10-17 Thread Francisco Olarte
On Sun, 17 Oct 2021 at 21:04, Florents Tselai  wrote:
> Yes, That COPY-delete-COPY sequence is what I ended up doing.
> Unfortunately can’t use ranges as the PK its a text string.

Unless you have a really weird PK and have trouble calculating bounds,
text strings are sortable and fine to use as ranges. If you use
half-open intervals you no not even need to query them all (i.e., do a
batch fo key<'aa', then one for >='aa', <'ab', repeat, terminate with
a >='zz' ( substitute letter pairs for whatever you think will
partition your keys in adequate chunks ) ( you can find adequate
bounds scanning the pkindex and skipping, just rememberto sue
half-open intervals and cover all the key domain ).

Francisco Olarte.




Re: Force re-compression with lz4

2021-10-17 Thread Florents Tselai
Oh, that’s good to know then. So besides ALTER COMPRESSION for future inserts 
there’s not much one can do for pre-existing values

I think it makes sense to update/ add more info to the docs on this as well, 
since other people in the thread expected this to work that way too.
Maybe at some point, even allow an explicit option to be defined during VACUUM 
? 


> On 18 Oct 2021, at 8:18 AM, Michael Paquier  wrote:
> 
> On Sun, Oct 17, 2021 at 10:13:48PM +0300, Florents Tselai wrote:
>> I did look into VACUUM(full) for it’s PROCESS_TOAST option which
>> makes sense, but the thing is I already had a cron-ed VACUUM (full)
>> which I ended up disabling a while back; exactly because of the
>> double-space requirement.
> 
> Please note that VACUUM FULL does not enforce a recompression on
> existing values.  See commit dbab0c0, that disabled this choice as it
> introduced a noticeable performance penality in some cases when
> looking at the compression type of the vacuumed table attributes:
> =# CREATE TABLE cmdata(f1 text COMPRESSION pglz);
> CREATE TABLE
> =# INSERT INTO cmdata VALUES(repeat('1234567890', 1000));
> INSERT 0 1
> =# SELECT pg_column_compression(f1) FROM cmdata;
> pg_column_compression
> ---
>  pglz
> (1 row)
> =# ALTER TABLE cmdata ALTER COLUMN f1 SET COMPRESSION lz4;
> ALTER TABLE
> =# VACUUM FULL cmdata;
> VACUUM
> =# SELECT pg_column_compression(f1) FROM cmdata;
> pg_column_compression
> ---
>  pglz
> (1 row)
> --
> Michael