On Tue, Jan 24, 2017 at 9:59 PM, Mithun Cy <mithun...@enterprisedb.com> wrote: > On Thu, Jan 19, 2017 at 5:08 PM, Ashutosh Sharma <ashu.coe...@gmail.com> > wrote: > > Thanks, Ashutosh and Jesper. I have tested the patch I do not have any > more comments so making it ready for committer.
I took a look at this patch. I think hash_page_items() is buggy. It's a little confused about whether it is building an array of datums (which can be assembled into a tuple using heap_form_tuple) or an array of cstrings (which can be assembled into a tuple using BuildTupleFromCStrings). It's mostly the former: the array is of type Datum, and two of the three values put into it are datums. But the code that puts the TID in there is stolen from bt_page_items(), which is constructing an array of cstrings, so it's wrong here. The practical consequence of this is, I believe, that the TIDs output by this function will be totally garbled and wrong, which is possibly why the example in the documentation looks so wacky: + 1 | (3407872,12584) | 1053474816 + 2 | (3407872,12584) | 1053474816 The heap tuple is on page 3407872 at line pointer offset 12584? That's an awfully but not implausibly big page number (about 26GB into the relation) and an awfully and implausibly large line pointer offset (how do we fit 12584 index tuples into an 8192-byte page?). Also, the fact that this example has two index entries with the same hash code pointing at the same heap TID seems wrong; wouldn't that result in index scans returning duplicates? I think what's actually happening here is that (3407872,12584) is actually the attempt to interpret some chunk of memory containing the text representation of a TID as an actual TID. When I print those numbers as hex, I get 0x343128, and those are the digits "4" and "1" and an opening parenthesis ")" as ASCII, so that might fit this theory. The code that tries to extract the hashcode from the item also looks suspicious. I'm not 100% sure whether it's wrong or just strangely-written, but it doesn't make much sense to extract the item contents, then using sprintf() to turn that into a hex string of bytes, then parse the hex string using strtol() to get an integer back. I think what it ought to be doing is getting a pointer to the index tuple and then calling _hash_get_indextuple_hashkey. Another minor complaint about hash_page_items is that it doesn't pfree(uargs->page). I'm not sure it's necessary to pfree anything here, but if we're going to do it I think we might as well be consistent with the btree case. Anyway it can hardly make sense to free the 8-byte structure and not the 8192-byte page to which it's pointing. In hash_bimap_info(), we go to the trouble of creating a raw page but never do anything with it. I guess the idea here is just to error out if the supplied page number is not an overflow page, but there are no comments explaining that. Anyway, I'm not sure it's a very good idea, because it means that if you try to write a query to interrogate the status of all the bitmap pages, it's going to read all of the overflow pages to which they point, which makes the operation a LOT more expensive. I think it would be better to leave this part out; if the user wants to know which pages are actually overflow pages, they can use hash_page_type() to figure it out. Let's make it the job of this function just to check the available/free status of the page in the bitmap, without reading the bitmap itself. In doing that, I think it should either return (bitmapblkno, bitmapbit, bit) or just bit. Returning bitmapblkno but not bitmapbit seems strange. Also, I think it should return bit as a bool, not int4. Another general note is that, in general, if you use the BlahGetDatum() function to construct a datum, the SQL type should be match the macro you picked - e.g. if the SQL type is int4, the macro used should be Int32GetDatum(), not UInt32GetDatum() or anything else. If the SQL type is bool, you should be using BoolGetDatum(). Apart from the above, I did a little work to improve the reporting when a page of the wrong type is verified. Updated patch with those changes attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pageinspect-hashindex-rmh.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers