Why is tuple_percent so low?
I am trying to refactor a table on disk so it consumes less space: Original is: create table post_timings( topic_id int not null, post_number int not null, user_id int not null, msecs int not null ) Target is: create table post_timings( post_id int not null, user_id int not null, dsecs smallint not null ) Before I have: select * from pgstattuple('post_timings2'); table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent +-++---+--++++-- 5146427392 | 116221695 | 4648867800 | 90.33 | 0 | 0 | 0 | 15082484 | 0.29 After I have: table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent +-++---+--++++-- 5142036480 | 116122544 | 3948166496 | 76.78 | 0 | 0 | 0 | 15069224 | 0.29 What I find striking is that the table size on disk remains almost unchanged despite tuples taking 6 less bytes per tuple. All the "missing space" is in overhead that is missing from pgstattuple, in particular tuple percent moves from 90 to 76.7 I was wondering: 1. Where is all my missing space, is this in page alignment stuff and per-page overhead? 2. Is there any other schemes I can look at for storing this data to have a more efficient yet easily queryable / updateable table. Keep in mind these tables get huge and in many of our cases will span 10-20GB just to store this information. Sam
Faster way of estimating database size
Hi there, At the moment we are using: SELECT pg_database.datname, pg_database_size(pg_database.datname) as size FROM pg_database To gather size of databases for monitoring purposes in Prometheus. Our pg stat logging now shows this is our number one query cause we seem to be running it for some crazy reason 19 times a minute. Clearly this is too much and we should (and will) cache results for a bit. Nonetheless, I notice it take 400ms to run on our cluster (with a few 200 dbs) and was wondering if there is a quicker way of estimating this number? Sam
Why can I not get lexemes for Hebrew but can get them for Armenian?
(This is a cross post from Stack Exchange, not getting much traction there) On my Mac install of PG: ``` =# select to_tsvector('english', 'abcd สวัสดี'); to_tsvector - 'abcd':1 (1 row) =# select * from ts_debug('hello สวัสดี'); alias | description | token | dictionaries | dictionary | lexemes ---+-+---++--+- asciiword | Word, all ASCII | hello | {english_stem} | english_stem | {hello} blank | Space symbols | สวัสดี | {} | | (2 rows) ``` On my Linux install of PG: ``` =# select to_tsvector('english', 'abcd สวัสดี'); to_tsvector --- 'abcd':1 'สวัสดี':2 (1 row) =# select * from ts_debug('hello สวัสดี'); alias |description| token | dictionaries | dictionary | lexemes ---+---+---++--+- asciiword | Word, all ASCII | hello | {english_stem} | english_stem | {hello} blank | Space symbols | | {} | | word | Word, all letters | สวัสดี | {english_stem} | english_stem | {สวัสดี} (3 rows) ``` So something is clearly different about the way the tokenisation is defined in PG. My question is, how do I figure out what is different and how do I make my mac install of PG work like the Linux one? On both installs: ``` # SHOW default_text_search_config; default_text_search_config pg_catalog.english (1 row) # show lc_ctype; lc_ctype - en_US.UTF-8 (1 row) ``` So somehow this mac install thinks that thai letters are spaces... how do I debug this and fix the "Space Symbol" definition here. Interestingly this install works with Armenian, but falls over when we reach Hebrew ``` =# select * from ts_debug('ԵԵԵ'); alias |description| token | dictionaries | dictionary | lexemes ---+---+---++--+- word | Word, all letters | ԵԵԵ | {english_stem} | english_stem | {եեե} (1 row) =# select * from ts_debug('אאא'); alias | description | token | dictionaries | dictionary | lexemes ---+---+---+--++- blank | Space symbols | אאא | {} || (1 row) ```