On Fri, Aug 19, 2016 at 2:25 PM Victor Blomqvist <v...@viblo.se> wrote:
> On Fri, Aug 19, 2016 at 1:31 PM, Sameer Kumar <sameer.ku...@ashnik.com> > wrote: > >> >> >> On Fri, 19 Aug 2016, 1:07 p.m. Victor Blomqvist, <v...@viblo.se> wrote: >> >>> Hi, >>> >>> Is it possible to break/limit a query so that it returns whatever >>> results found after having checked X amount of rows in a index scan? >>> >>> For example: >>> create table a(id int primary key); >>> insert into a select * from generate_series(1,100000); >>> >>> select * from a >>> where id%2 = 0 >>> order by id limit 10 >>> >>> In this case the query will "visit" 20 rows and filter out 10 of them. >>> We can see that in the query plan: >>> "Rows Removed by Filter: 10" >>> "Heap Fetches: 20" >>> >>> Is it somehow possible to limit this query so that it only fetches X >>> amount, in my example if we limited it to 10 Heap Fetches the query could >>> return the first 5 rows? >>> >>> > >>> My use case is I have a table with 35 million rows with a geo index, and >>> I want to do a KNN search but also limit the query on some other >>> parameters. In some cases the other parameters restrict the query so much >>> that Heap Fetches becomes several 100k or more, and in those cases I would >>> like to have a limit to my query. >>> >> >> Have you checked the TABLESAMPLE clause in v9.5? >> >> https://wiki.postgresql.org/wiki/TABLESAMPLE_Implementation >> >> > Unless I misunderstand what you mean or how it works I cant really see > what it would help. > > I stand corrected. TABLESAMPLE will not help you. > I want my query to still return the "best" results, and I want it to use > the index for that. Just randomly selecting out from the whole table will > either have to sample a too small subset of the rows, or be too slow. > > So, given my query above, in the normal ("slow" case) I would find the 10 > first even rows: > 2,4,6,8,10,12,14,16,18,20 > If I could restrict the heap fetches to 10 I would find > 2,4,6,8,10 > However, with tablesample I might end up with for example these rows: > 15024,71914,51682,7110,61802,63390,98278,8022,34256,49220 > > How about using the LIMIT ? SELECT column_1, column_2, ... FROM my_table WHERE <<expression>> ORDER BY my_column LIMIT 10 ; > In my use case I want the best rows (according to the order by), so just > a random sample is not good enough. > > /Victor > > >> >>> Thanks! >>> /Victor >>> >> -- >> -- >> Best Regards >> Sameer Kumar | DB Solution Architect >> *ASHNIK PTE. LTD.* >> >> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533 >> >> T: +65 6438 3504 | M: +65 8110 0350 >> >> Skype: sameer.ashnik | www.ashnik.com >> > -- -- Best Regards Sameer Kumar | DB Solution Architect *ASHNIK PTE. LTD.* 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533 T: +65 6438 3504 | M: +65 8110 0350 Skype: sameer.ashnik | www.ashnik.com