Re: [HACKERS] Hash Indexes
On Mon, Sep 19, 2016 at 05:50:13PM +1200, Mark Kirkwood wrote: > >I'm rather unenthused about having a hash index implementation that's > >mildly better in some corner cases, but otherwise doesn't have much > >benefit. That'll mean we'll have to step up our user education a lot, > >and we'll have to maintain something for little benefit. > > While I see the point of what you are saying here, I recall all previous > discussions about has indexes tended to go a bit like this: > > - until WAL logging of hash indexes is written it is not worthwhile trying > to make improvements to them > - WAL logging will be a lot of work, patches 1st please > > Now someone has done that work, and we seem to be objecting that because > they are not improved then the patches are (maybe) not worthwhile. I think > that is - essentially - somewhat unfair. My understanding of hash indexes is that they'd be good for indexing random(esque) data (such as UUIDs or, well, hashes like shaX). If so then I've got a DB that'll be rather big that is the very embodiment of such a use case. It indexes such data for equality comparisons and runs on SELECT, INSERT and, eventually, DELETE. Lack of WAL and that big warning in the docs is why I haven't used it. Given the above, many lamentations from me that it wont be available for 9.6. :( When 10.0 comes I'd probably go to the bother of re-indexing with hash indexes. Andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Wed, Sep 21, 2016 at 08:44:15PM +0100, Geoff Winkless wrote: > On 21 September 2016 at 13:29, Robert Haas wrote: > > I'd be curious what benefits people expect to get. > > An edge case I came across the other day was a unique index on a large > string: postgresql popped up and told me that I couldn't insert a > value into the field because the BTREE-index-based constraint wouldn't > support the size of string, and that I should use a HASH index > instead. Which, of course, I can't, because it's fairly clearly > deprecated in the documentation... Thanks for that. Forgot about that bit of nastiness. I came across the above migrating a MySQL app to PostgreSQL. MySQL, I believe, handles this by silently truncating the string on index. PostgreSQL by telling you it can't index. :( So, as a result, AFAIK, I had a choice between a trigger that did a left() on the string and inserts it into a new column on the table that I can then index or do an index on left(). Either way you wind up re-writing a whole bunch of queries. If I wanted to avoid the re-writes I had the option of making the DB susceptible to poor recovery from crashes, et all. No matter which option I chose, the end result was going to be ugly. It would be good not to have to go ugly in such situations. Sometimes one size does not fit all. For me this would be a second major case where I'd use usable hashed indexes the moment they showed up. Andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pgsql 10: hash indexes testing
Hi, As I am actively working on a big project I figured I'd give PGSQL 10 a go, primarily because of hash indexes. PostgreSQL 10 version in use is: 10~beta2~20170620.2224-1~491.gitd412f79.pgdg+1 Things are mostly well with hash indexes (plain, non-unique) giving me a rather lovely saving in index size: they are roughly 1/4 to 1/2 the size of btree. This is of use as the btree indexes can be between 25GB and 350GB in size. There is one index that caused an issue. Towards the end of an import I got the following error: out of overflow pages in hash index The data being indexed is BYTEA, (quasi)random and 64 bytes in size. The table has over 2 billion entries. The data is not unique. There's an average of 10 duplicates for every unique value. Is this a valid error message or have I spotted a bug? I tried to duplicate the error with a reindex of the data set that I uploaded using btree for that index but I couldn't. The difference, as near as I can remember, is different (quasi)random values being indexed (still same level of duplication). I'll see if I can get the table reloaded a-fresh to see if that's what it takes to trigger the error. The upload is done in a single transaction with COPY BINARY of 3-4 tables, one after the other in a loop until data is exhausted (ie: COPY into table A, B, C and D and then back to A and repeat - there is data processing happening and this helps keep memory usage in check). A single transaction can COPY millions of rows (average is about 3.7 million rows with the table in question getting 3 million). There are 33 transactions in play at any one time and they all upload to the same tables. Last things done in a transaction are a couple of small, single-row INSERTs into a couple of tables and then COMMIT. There is one transaction per connection. The whole process can take 12-15 hours and involves 1000 transactions. Hopefully it's not a specific set of random values that generates the error cos duplicating THAT will be outpaced by the death of the universe. :) AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Wed, Jul 05, 2017 at 08:10:10AM +0530, Amit Kapila wrote: > On Tue, Jul 4, 2017 at 4:27 PM, AP wrote: > > There is one index that caused an issue. Towards the end of an import > > I got the following error: > > > > out of overflow pages in hash index > > > > The data being indexed is BYTEA, (quasi)random and 64 bytes in size. > > The table has over 2 billion entries. The data is not unique. There's > > an average of 10 duplicates for every unique value. > > > > Is this a valid error message or have I spotted a bug? > > It is difficult to say at this stage, but I think we can figure out. > We can get such a message if we consume the maximum overflow pages > that hash index can support which is limited by a number of > bitmappages. Can you try to use pgstattuple extension and get us the > results of Select * from pgstathashindex('index_name');? If the > number of bitmappages is 128 and total overflow pages are 128 * 4096, > then that would mean that all the pages are used. Then maybe we can Hmm. Unless I misunderstood that'd mean that overflow_pages/4096 should result in a number <= 128 at the moment, right? If so then something is amiss: # select * from pgstathashindex('link_datum_id_hash_idx'); version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent -+--++--+--+++-- 3 | 10485760 |2131192 | 66 |0 | 2975444240 | 0 | 1065.19942179026 (1 row) oldmdstash=# select 2131192/4096; ?column? -- 520 (1 row) And I do appear to have an odd percentage of free space. :) This index was created yesterday so it has been around for maybe 18 hours. Autovac is likely to have hit it by now. > > I'll see if I can get the table reloaded a-fresh to see if that's what > > it takes to trigger the error. > > Thanks. I suggest when an error occurs, don't throw away that index > because we can get some useful information from it to diagnose the > reason of error. I'll try and set this up now. AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Wed, Jul 05, 2017 at 10:29:09AM +0530, Amit Kapila wrote: > >> bitmappages. Can you try to use pgstattuple extension and get us the > >> results of Select * from pgstathashindex('index_name');? If the > >> number of bitmappages is 128 and total overflow pages are 128 * 4096, > >> then that would mean that all the pages are used. Then maybe we can > > > > Hmm. Unless I misunderstood that'd mean that overflow_pages/4096 should > > result in a number <= 128 at the moment, right? > > No, sorry, I think my calculation above has something missing. It > should be 128 * 4096 * 8. How we can compute this number is > no_bitmap_pages * no_bits_used_to_represent_overflow_pages. AHA! Ok. Then that appears to match. I get 65.041. > >If so then something is > > amiss: > > > > # select * from pgstathashindex('link_datum_id_hash_idx'); > > version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | > > live_items | dead_items | free_percent > > -+--++--+--+++-- > >3 | 10485760 |2131192 | 66 |0 | > > 2975444240 | 0 | 1065.19942179026 > > (1 row) > > > > oldmdstash=# select 2131192/4096; > > ?column? > > -- > > 520 > > (1 row) > > You need to divide 520 by 8 to get the bitmap page. Is this the index > in which you get the error or is this the one on which you have done > REINDEX? Post REINDEX. > > And I do appear to have an odd percentage of free space. :) > > > > It looks like Vacuum hasn't been triggered. :( > > This index was created yesterday so it has been around for maybe 18 hours. > > Autovac is likely to have hit it by now. > > Do you have any deletes? How have you verified whether autovacuum has No DELETEs. Just the initial COPY, then SELECTs, then a DB rename to get it out of the way of other testing, then the REINDEX. > been triggered or not? I just checked pg_stat_user_tables (which I hope is the right place for this info :) relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze| vacuum_count | autovacuum_count | analyze_count | autoanalyze_count ---++-+--+--+--+---++---+---+---+++-+-+-+---+---+--+--+---+--- 129311803 | public | link| 70 | 15085880072 | 5779 | 465623 | 2975444240 | 0 | 0 | 0 | 928658178 | 0 | 0 | | | | 2017-06-28 10:43:51.273241+10 |0 |0 | 0 | 2 So it appears not. # show autovacuum; autovacuum on (1 row) All autovacuum parameters are as per default. The autovacuum launcher process exists. :( AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Wed, Jul 05, 2017 at 03:33:45PM +1000, AP wrote: > > Do you have any deletes? How have you verified whether autovacuum has > > No DELETEs. Just the initial COPY, then SELECTs, then a DB rename to get it > out of the way of other testing, then the REINDEX. > > > been triggered or not? > > I just checked pg_stat_user_tables (which I hope is the right place for > this info :) > >relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | > idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | > n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum > | last_analyze | last_autoanalyze| > vacuum_count | autovacuum_count | analyze_count | autoanalyze_count > ---++-+--+--+--+---++---+---+---+++-+-+-+---+---+--+--+---+--- > 129311803 | public | link| 70 | 15085880072 | 5779 | > 465623 | 2975444240 | 0 | 0 | 0 | 928658178 | > 0 | 0 | | | > | 2017-06-28 10:43:51.273241+10 |0 | > 0 | 0 | 2 > > So it appears not. Actually, after a bit more late-arvo thought, I take it this would be the case as the table has not changed since creation. Thus no need to autovac. I've newer timestamps on the live db (whose data was uploaded more recently) for its tables so I think autovac is functioning. AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Tue, Jul 04, 2017 at 08:23:20PM -0700, Jeff Janes wrote: > On Tue, Jul 4, 2017 at 3:57 AM, AP wrote: > > The data being indexed is BYTEA, (quasi)random and 64 bytes in size. > > The table has over 2 billion entries. The data is not unique. There's > > an average of 10 duplicates for every unique value. > > What is the number of duplicates for the most common value? Damn. Was going to collect this info as I was doing a fresh upload but it fell through the cracks of my mind. It'll probably take at least half a day to collect (a simple count(*) on the table takes 1.5-1.75 hours parallelised across 11 processes) so I'll probably have this in around 24 hours if all goes well. (and I don't stuff up the SQL :) ) AP. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote: > >> > version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | > >> > live_items | dead_items | free_percent > >> > -+--++--+--+++-- > >> >3 | 10485760 |2131192 | 66 |0 | > >> > 2975444240 | 0 | 1065.19942179026 > >> > (1 row) ... > >> > And I do appear to have an odd percentage of free space. :) > > Are you worried about "unused_pages"? If so, then this is not a major Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :) Shouldn't that number be < 100? > reason to worry, because these are probably freed overflow pages which > can be used in future. In the hash index, when we free the overflow > pages, they are not returned back to OS, rather they are tracked in > the index as unused pages which will get used when required in future. > >> It looks like Vacuum hasn't been triggered. > > Vacuum won't be triggered on insert load. I think that is one of the > reasons why in your initial copy, you might have got the error. We > had some discussion in the past to trigger Vacuum on insert heavy > workloads [1], but the patch still didn't get committed. I think if > that patch or some other form of that patch gets committed, it will > help the workload what you are trying here. Well, if this is the cause of my little issue, it might be nice. ATM my import script bombs out on errors (that I've duplicated! :) It took 11 hours but it bombed) and it sounds like I'll need to do a manual VACUUM before it can be run again. The stats you were looking for before are: # select * from pgstathashindex('link_datum_id_idx'); version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent -+--++--+--+++-- 3 | 8559258 |4194176 | 128 | 1926502 | 3591812743 | 0 | 942.873199357466 (1 row) # select 4194176.0/128/8; ?column? --- 4095.8750 (1 row) If you need more info or whatnot, shout. I've a problematic index to play with now. > [1] - > https://www.postgresql.org/message-id/b970f20f-f096-2d3a-6c6d-ee887bd30cfb%402ndquadrant.fr AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Thu, Jul 06, 2017 at 08:52:03AM +0530, Amit Kapila wrote: > On Thu, Jul 6, 2017 at 2:40 AM, AP wrote: > > On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote: > >> >> > version | bucket_pages | overflow_pages | bitmap_pages | > >> >> > unused_pages | live_items | dead_items | free_percent > >> >> > -+--++--+--+++-- > >> >> >3 | 10485760 |2131192 | 66 | > >> >> > 0 | 2975444240 | 0 | 1065.19942179026 > >> >> > (1 row) > > ... > >> >> > And I do appear to have an odd percentage of free space. :) > >> > >> Are you worried about "unused_pages"? If so, then this is not a major > > > > Nope. "free_percent" Just a bit weird that I have it at over 1000% free. :) > > Shouldn't that number be < 100? > > Yes, there seems to be some gotcha in free percent calculation. Is it > possible for you to debug or in some way share the test? I can try to debug but I need to know what to look for and how. If it requires data reloads then that's around 12-15 hours per hit. As for sharing the test, that'd mean sharing the data. If it helps I can provide the content of that column but you're looking at an sql dump that is roughly (2*64+1)*2.3 billion (give or take a (few) billion) in size. :) > > Well, if this is the cause of my little issue, it might be nice. ATM > > my import script bombs out on errors (that I've duplicated! :) It took > > 11 hours but it bombed) and it sounds like I'll need to do a manual > > VACUUM before it can be run again. > > > > Yeah, I think after manual vacuum you should be able to proceed. I don't think that'll help. I did a small check which I hope is helpful in seeing if it will. Working off a similar db that completed (as it was smaller and I did not want to mess with my one copy of the broken db) I got the following results: Pre-VACUUM: --- # \di+ List of relations Schema | Name| Type | Owner | Table | Size | Description +---+---+---+-+-+- ... public | link_datum_id_idx | index | mdkingpin | link| 90 GB | ... # select * from pgstathashindex('link_datum_id_idx'); version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent -+--++--+--+++- 3 | 7611595 |3451261 | 106 | 777013 | 3076131325 | 0 | 1512.8635780908 # vacuum VERBOSE ANALYZE link; INFO: vacuuming "public.link" INFO: "link": found 0 removable, 2272156152 nonremovable row versions in 120507771 out of 123729466 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 8594 There were 0 unused item pointers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 457.16 s, system: 755.84 s, elapsed: 4196.75 s. INFO: vacuuming "pg_toast.pg_toast_183727891" INFO: index "pg_toast_183727891_index" now contains 1441820 row versions in 3956 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.02 s, system: 0.00 s, elapsed: 0.56 s. INFO: "pg_toast_183727891": found 0 removable, 1441820 nonremovable row versions in 332271 out of 332271 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 8594 There were 0 unused item pointers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 1.16 s, system: 2.26 s, elapsed: 22.80 s. INFO: analyzing "public.link" INFO: "link": scanned 300 of 123729466 pages, containing 56661337 live rows and 0 dead rows; 300 rows in sample, 2330296882 estimated total rows VACUUM Time: 7057484.079 ms (01:57:37.484) Post-VACUUM: # \di+ Schema | Name| Type | Owner | Table | Size | Description +---+---+---+-+-+- public | link_datum_id_idx | index | mdkingpin | link| 90 GB | # select * from pgstathashindex('link_datum_id_idx'); version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent -+--++--+--+++- 3 | 7611595 |3451261 | 106 | 777013 | 3076
Re: [HACKERS] pgsql 10: hash indexes testing
On Thu, Jul 06, 2017 at 12:38:38PM +0530, Amit Kapila wrote: > On Thu, Jul 6, 2017 at 9:32 AM, AP wrote: > > On Thu, Jul 06, 2017 at 08:52:03AM +0530, Amit Kapila wrote: > >> On Thu, Jul 6, 2017 at 2:40 AM, AP wrote: > >> > On Wed, Jul 05, 2017 at 05:52:32PM +0530, Amit Kapila wrote: > >> >> >> > version | bucket_pages | overflow_pages | bitmap_pages | > >> >> >> > unused_pages | live_items | dead_items | free_percent > >> >> >> > -+--++--+--+++-- > >> >> >> >3 | 10485760 |2131192 | 66 | > >> >> >> > 0 | 2975444240 | 0 | 1065.19942179026 > >> >> >> > (1 row) > >> > ... > >> >> >> > And I do appear to have an odd percentage of free space. :) > >> >> > >> >> Are you worried about "unused_pages"? If so, then this is not a major > >> > > >> > Nope. "free_percent" Just a bit weird that I have it at over 1000% free. > >> > :) > >> > Shouldn't that number be < 100? > >> > >> Yes, there seems to be some gotcha in free percent calculation. Is it > >> possible for you to debug or in some way share the test? > > > > I can try to debug but I need to know what to look for and how. > > Okay, you need to debug function pgstathashindex and have your > breakpoint at free_percent calculation, then try to get the values of > nblocks, all the values in stats struct and total_space. I think > after getting this info we can further decide what to look for. Ok. I'll try and get to this tomorrow amidst fun with NFS. Hopefully there'll be time. So... I'll need postgresql-10-dbg - debug symbols for postgresql-10 Then given https://doxygen.postgresql.org/pgstatindex_8c.html#af86e3b4c40779d4f30a73b0bfe06316f set a breakpoint at pgstatindex.c:710 via gdb and then have fun with print? > > As for sharing the test, that'd mean sharing the data. If it helps I can > > provide the content of that column but you're looking at an sql dump that > > is roughly (2*64+1)*2.3 billion (give or take a (few) billion) in size. :) > > This is tricky, will Ibe able to import that column values by creating > table, if so, then probably it is worth. Should do. Thinking about it a little more, I can shrink the file down by roughly half if I don't do a pg_dump or similar. Doing COPY link (datum_id) TO '/tmp/moocow.copy' WITH (FORMAT BINARY) should allow you to use COPY FROM to restore the file and produce something a lot smaller than a dump, right? The table is simple: CREATE TABLE link (datum_id BYTEA); I can't give you the rest of the table (one other column) as the stuff hidden in there is private. The only thing that wont give you is the manner in which the column is filled (ie: the transactions, their size, how long they run, their concurrency etc). Don't know if that's important. > >> So at this stage, there are two possibilities for you (a) run manual > >> Vacuum in-between (b) create the index after bulk load. In general, > >> whatever I have mentioned in (b) is a better way for bulk loading. > >> Note here again the free_percent seems to be wrong. > > > > If you didn't mean VACUUM FULL then (a) does not appear to work and (b) > > would kill usability of the db during import, which would happen daily > > (though with a vastly reduced data size). > > If the data size in subsequent import is very less, then you only need > to Create the index after first import and then let it continue like > that. Managing this has suddenly gotten a lot more complex. :) > > It also messes with the > > permission model that has been set up for the least-trusted section of > > the project (at the moment that section can only INSERT). > > As per your permission model Vacuum Full is allowed, but not Create index? Well, at the moment it's all in development (though time for that to end is coming up). As such I can do things with enhanced permissions manually. When it hits production, that rather stops. > As mentioned above REINDEX might be a better option. I think for such > situation we should have some provision to allow squeeze functionality > of hash exposed to the user, this will be less costly than REINDEX and > might serve the purpose for the user. Hey, can you try some hack in Assuming it does help, would this be something one would need to guess at? "I did a whole bunch of concurrent INSERT heavy transact
Re: [HACKERS] pgsql 10: hash indexes testing
On Wed, Jul 05, 2017 at 07:31:39PM +1000, AP wrote: > On Tue, Jul 04, 2017 at 08:23:20PM -0700, Jeff Janes wrote: > > On Tue, Jul 4, 2017 at 3:57 AM, AP wrote: > > > The data being indexed is BYTEA, (quasi)random and 64 bytes in size. > > > The table has over 2 billion entries. The data is not unique. There's > > > an average of 10 duplicates for every unique value. > > > > What is the number of duplicates for the most common value? > > Damn. Was going to collect this info as I was doing a fresh upload but > it fell through the cracks of my mind. It'll probably take at least > half a day to collect (a simple count(*) on the table takes 1.5-1.75 > hours parallelised across 11 processes) so I'll probably have this in > around 24 hours if all goes well. (and I don't stuff up the SQL :) ) Well... num_ids | count -+-- 1 | 91456442 2 | 56224976 4 | 14403515 16 | 13665967 3 | 12929363 17 | 12093367 15 | 10347006 So the most common is a unique value, then a dupe. AP. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote: > I think if you are under development, it is always advisable to create > indexes after initial bulk load. That way it will be faster and will > take lesser space atleast in case of hash index. This is a bit of a pickle, actually: * if I do have a hash index I'll wind up with a bloated one at some stage that refused to allow more inserts until the index is re-created * if I don't have an index then I'll wind up with a table where I cannot create a hash index because it has too many rows for it to handle I'm at a bit of a loss as to how to deal with this. The DB design does come with a kind of partitioning where a bundle of tables get put off to the side and searched seperately as needed but too many of those and the impact on performance can be noticed so I need to minimise them. > >> As mentioned above REINDEX might be a better option. I think for such > >> situation we should have some provision to allow squeeze functionality > >> of hash exposed to the user, this will be less costly than REINDEX and > >> might serve the purpose for the user. Hey, can you try some hack in > > > > Assuming it does help, would this be something one would need to guess > > at? "I did a whole bunch of concurrent INSERT heavy transactions so I > > guess I should do a squeeze now"? > > > > Or could it be figured out programmatically? > > I think one can refer free_percent and number of overflow pages to > perform such a command. It won't be 100% correct, but we can make a > guess. We can even check free space in overflow pages with page > inspect to make it more accurate. Does this take much time? Main reason I am asking is that this looks like something that the db ought to handle underneath (say as part of an autovac run) and so if there are stats that the index code can maintain that can then be used by the autovac (or something) code to trigger a cleanup this I think would be of benefit. Unless I am being /so/ unusual that it's not worth it. :) I'll reply to the rest in a separate stream as I'm still poking other work related things atm so can't do the debug testing as yet. AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Fri, Jul 07, 2017 at 05:58:25PM +0530, Amit Kapila wrote: > On Fri, Jul 7, 2017 at 8:22 AM, AP wrote: > > On Thu, Jul 06, 2017 at 05:19:59PM +0530, Amit Kapila wrote: > >> I think if you are under development, it is always advisable to create > >> indexes after initial bulk load. That way it will be faster and will > >> take lesser space atleast in case of hash index. > > > > This is a bit of a pickle, actually: > > * if I do have a hash index I'll wind up with a bloated one at some stage > > that refused to allow more inserts until the index is re-created > > * if I don't have an index then I'll wind up with a table where I cannot > > create a hash index because it has too many rows for it to handle > > > > I'm at a bit of a loss as to how to deal with this. > > I can understand your concerns. To address first concern we need to > work on one or more of following work items: (a) work on vacuums that > can be triggered on insert only workload (it should perform index > vacuum as well) (b) separate utility statement/function to squeeze > hash index (c) db internally does squeezing like after each split, so > that chances of such a problem can be reduced, but that will be at the > cost of performance reduction in other workloads, so not sure if it is > advisable. Among these (b) is simplest to do but may not be > convenient for the user. (a) seems like a good compromise on (c) if it can be done without disruption and in time. (b) seems analogous to the path autovcauum took. Unless I misremember, before autovacuum we had a cronjob to do similar work. It's probably a sane path to take as a first step on the way to (a) (c) may not be worth the effort if it compromises general use, though perhaps it could be used to indicate to (a) that now is a good time to handle this bit? > To address your second concern, we need to speed up the creation of > hash index which is a relatively big project. Having said that, I > think in your case, this is one-time operation so spending once more > time might be okay. Yup. Primarily I just wanted the idea out there that this isn't that easy to cope with manually and to get it onto a todo list (unless it was an easy thing to do given a bit of thought but it appears not). Out of curiosity, and apologies if you explained it already and I missed the signficance of the words, how does this bloat happen? There tables obly cop COPY. There is no UPDATE or DELETE; all transactions get COMMITted so there's no ROLLBACK undoing the COPY and yet the bloat occurs. AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Wed, Aug 02, 2017 at 11:34:13AM -0400, Robert Haas wrote: > On Wed, Jul 12, 2017 at 1:10 AM, Amit Kapila wrote: > > It seems so. Basically, in the case of a large number of duplicates, > > we hit the maximum number of overflow pages. There is a theoretical > > possibility of hitting it but it could also happen that we are not > > free the existing unused overflow pages due to which it keeps on > > growing and hit the limit. I have requested up thread to verify if > > that is happening in this case and I am still waiting for same. The > > squeeze operation does free such unused overflow pages after cleaning > > them. As this is a costly operation and needs a cleanup lock, so we > > currently perform it only during Vacuum and next split from the bucket > > which can have redundant overflow pages. > > Oops. It was rather short-sighted of us not to increase > HASH_MAX_BITMAPS when we bumped HASH_VERSION. Actually removing that > limit is hard, but we could have easily bumped it for 128 to say 1024 > without (I think) causing any problem, which would have given us quite > a bit of headroom here. I suppose we could still try to jam that > change in before beta3 (bumping HASH_VERSION again) but that might be > asking for trouble. I, for one, would be grateful for such a bump (or better). Currently having more fun than I wish trying to figure out where my inserts will begin to fail so that I don't make a mess of things. Right now I can't match data going in with sane partitioning points in-storage. If I can go "3 months worth of data then partition" or the like and have enough room for differences in the data then I can be pretty happy. ATM I'm not getting anywhere near that and am tempted to chuck it all in, eat the 3-4x disk space cost and go back to btree which'd cost me terrabytes. AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Fri, Aug 04, 2017 at 08:21:01AM +0530, Amit Kapila wrote: > Note - AP has off list shared the data dump and we (Ashutosh Sharma > and me) are able to reproduce the problem and we could see that if we > force vacuum via the debugger, then it is able to free overflow pages. > The exact numbers are not available at this stage as the test is not > complete. I've another if you would like it. I COPYed with FILLFACTOR of 10 and it eventually failed but I could not recreate the index (via CREATE INDEX CONCURRENTLY) with the data that made it using a fillfactor of 100. If I created the index again (again with the same data) with fillfactor 10 then it completed. I may be completely misunderstanding fillfactor but I always thought it was a performance optimisation rather than something that may allow you to store more (or less) index entries. The stats for the various indexes are: After COPYs started failing: fmmdstash=# select overflow_pages/bitmap_pages/8,* from pgstathashindex('link_datum_id_idx'); ?column? | version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent --+-+--++--+--+++-- 4095 | 3 |103782169 |4194176 | 128 | 13658343 | 5 085 570 007 | 0 | 21014.6558371539 (1 row) Time: 6146310.494 ms (01:42:26.310) After the CREATE INDEX CONCURRENTLY with FILLFACTOR 100 failed: fmmdstash=# select overflow_pages/bitmap_pages/8,* from pgstathashindex('link_datum_id_idx1'); ?column? | version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent --+-+--++--+--+++- 4095 | 3 | 6205234 |4194176 | 128 | 86222 | 3080760746 | 0 | 615.91682922039 (1 row) Time: 19128.527 ms (00:19.129) After the CREATE INDEX CONCURRENTLY with FILLFACTOR 10 succeeded: fmmdstash=# select overflow_pages/bitmap_pages/8,* from pgstathashindex('link_datum_id_idx2'); ?column? | version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent --+-+--++--+--+++-- 3062 | 3 | 79677471 |2572565 | 105 | 5074888 | 3187098806 | 0 | 19027.2399324415 (1 row) Time: 1557509.940 ms (25:57.510) The DB in question is now gone but I took a copy of the column as per before so if you'd like it I can make it available via the same means. AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Sat, Aug 05, 2017 at 04:41:24PM +0530, Amit Kapila wrote: > > (On another note, I committed these patches.) > > Thanks. Seconded. :) Now uploading data with fillfactor of 90. I'll know in 2-3 days if the new patches are successful (earlier if they did not help). I compiled (as apt.postgresql.org does not provide the latest beta3 version for stretch; only sid which has a different perl version) 10~beta3~20170805.2225-1~593.git0d1f98b. AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Sun, Aug 06, 2017 at 04:32:29PM +1000, AP wrote: > On Sat, Aug 05, 2017 at 04:41:24PM +0530, Amit Kapila wrote: > > > (On another note, I committed these patches.) > > > > Thanks. > > Seconded. :) > > Now uploading data with fillfactor of 90. I'll know in 2-3 days > if the new patches are successful (earlier if they did not help). > > I compiled (as apt.postgresql.org does not provide the latest > beta3 version for stretch; only sid which has a different perl > version) 10~beta3~20170805.2225-1~593.git0d1f98b. Have gotten success with a dataset that failed before with ff 10. End result: mdstash=# select * from pgstathashindex('link_datum_id_idx'); version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent -+--++--+--+++-- 4 | 12391325 |5148912 | 158 | 191643 | 4560007478 | 0 | 1894.29056075982 (1 row) mdstash=# select 5148912.0/158/8; ?column? --- 4073.506329113924 (1 row) The index is 135GB rather than 900GB (from memory/give or take). I'm happy so far. I'll be seeing how much more I can dump into it this weekend. :) Thanks muchly to the both of you. :) AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Thu, Aug 10, 2017 at 01:12:25PM -0400, Robert Haas wrote: > On Thu, Aug 10, 2017 at 6:41 AM, AP wrote: > > The index is 135GB rather than 900GB (from memory/give or take). > > Whoa. Big improvement. Not a good direct comparison in general but it fits my workload. The 900GB was fillfactor 10 and the 135 was ff 90 BUT ff 90 on v3 fails early into the import. Even ff 10 on v3 didn't succeed (came just short). So for my usage I was facing having indexes with fillfactor 10 just to be able to put a more reasonable amount of data in them. Almost. Now I don't have to as v4 copes with the load and more and in less disk space so for me, the above is just lovely. :) This is even more so given that the hash index v4 upload actually finished unlike the v3 one. :) As I said in my last email, this weekend I'll be adding more to that table so I'll see how far that takes me but the last two patches have given me a great deal of confidence that the end result will be good news. :) As an aside, btree for the above is around 2.5x bigger than hash v4 so chances are much better that a hash index will fit into ram which has its own benefits. :) AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql 10: hash indexes testing
On Fri, Aug 11, 2017 at 07:33:51AM +0530, Amit Kapila wrote: > On Thu, Aug 10, 2017 at 4:11 PM, AP wrote: > > mdstash=# select * from pgstathashindex('link_datum_id_idx'); > > version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | > > live_items | dead_items | free_percent > > -+--++--+--+++-- > >4 | 12391325 |5148912 | 158 | 191643 | > > 4560007478 | 0 | 1894.29056075982 > > (1 row) > > The free_percent calculation seems to be wrong. Can you please once > try after recent commit 0b7ba3d6474b8f58e74dba548886df3250805cdf? I > feel this should be fixed by that commit. Sorry I couldn't get to help you debugging this myself. Work got annoying. :/ That said, I think that this is the first time that I've seen the value be under 100: mdstash=# select * from pgstathashindex('link_datum_id_idx'); version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent -+--++--+--+++-- 4 | 22957200 |9272698 | 283 | 2208624 | 8448300552 | 0 | 39.8146658879555 (1 row) Time: 2882974.635 ms (48:02.975) The index is still functioning, too, with far more data than I've ever had in the table in the past and well beyond the point where it would previously die. Happy days. :) AP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers