On 17 April 2015 at 14:54, Petr Jelinek <p...@2ndquadrant.com> wrote:
> I agree that DDL patch is not that important to get in (and I made it last > patch in the series now), which does not mean somebody can't write the > extension with new tablesample method. > > > In any case attached another version. > > Changes: > - I addressed the comments from Michael > > - I moved the interface between nodeSampleScan and the actual sampling > method to it's own .c file and added TableSampleDesc struct for it. This > makes the interface cleaner and will make it more straightforward to extend > for subqueries in the future (nothing really changes just some functions > were renamed and moved). Amit suggested this at some point and I thought > it's not needed at that time but with the possible future extension to > subquery support I changed my mind. > > - renamed heap_beginscan_ss to heap_beginscan_sampling to avoid confusion > with sync scan > > - reworded some things and more typo fixes > > - Added two sample contrib modules demonstrating row limited and time > limited sampling. I am using linear probing for both of those as the > builtin block sampling is not well suited for row limited or time limited > sampling. For row limited I originally thought of using the Vitter's > reservoir sampling but that does not fit well with the executor as it needs > to keep the reservoir of all the output tuples in memory which would have > horrible memory requirements if the limit was high. The linear probing > seems to work quite well for the use case of "give me 500 random rows from > table". > For me, the DDL changes are something we can leave out for now, as a way to minimize the change surface. I'm now moving to final review of patches 1-5. Michael requested patch 1 to be split out. If I commit, I will keep that split, but I am considering all of this as a single patchset. I've already spent a few days reviewing, so I don't expect this will take much longer. -- Simon Riggs http://www.2ndQuadrant.com/ <http://www.2ndquadrant.com/> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services