Re: [HACKERS] A thought on Index Organized Tables

2010-03-02 Thread Gokulakannan Somasundaram
> > > I think you have to take up a simpler project as a first project. This > is a major overhaul of transaction information and it depends on > understanding how a lot of different areas work -- all of which are > very complex tricky areas to understand. > > > Greg, I just feel the fast

Re: [HACKERS] A thought on Index Organized Tables

2010-03-01 Thread Gokulakannan Somasundaram
> > > a) We are already going from table to index to do unique checks. This is > the > > same thing, which we will do to go and update the snapshot in the > indexes. > > No, it is not the same thing. Updating index snapshots requires being > able to *re-find* a previously made index entry for the

Re: [HACKERS] A thought on Index Organized Tables

2010-02-28 Thread Gokulakannan Somasundaram
> > The transaction information on tuples take 18 bytes plus several info > bits. It's possible just storing a subset of that would be useful but > it's unclear. And I think it would complicate the code if it had to > sometimes fetch the heap tuple to get the rest and sometimes doesn't. > Visibili

Re: [HACKERS] A thought on Index Organized Tables

2010-02-28 Thread Gokulakannan Somasundaram
> No, it is not the same thing. Updating index snapshots requires being > able to *re-find* a previously made index entry for the current row. > And it has to be done 100% reliably. The worst that happens if an index > entry is not found when it should be during a uniqueness check is that > the u

Re: [HACKERS] A thought on Index Organized Tables

2010-02-28 Thread Tom Lane
Gokulakannan Somasundaram writes: > a) We are already going from table to index to do unique checks. This is the > same thing, which we will do to go and update the snapshot in the indexes. No, it is not the same thing. Updating index snapshots requires being able to *re-find* a previously made

Re: [HACKERS] A thought on Index Organized Tables

2010-02-28 Thread Greg Stark
On Sun, Feb 28, 2010 at 6:02 AM, Gokulakannan Somasundaram wrote: > So just with a addition of 8 bytes per tuple, we can have the snapshot > stored with the index. Can someone please comment on this? The transaction information on tuples take 18 bytes plus several info bits. It's possible just st

Re: [HACKERS] A thought on Index Organized Tables

2010-02-27 Thread Gokulakannan Somasundaram
If i have got over excited in the previous update, please ignore that. a) We are already going from table to index to do unique checks. This is the same thing, which we will do to go and update the snapshot in the indexes. b) The way, it should work would be to have a check on whether the operator

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> > > Actually Tom, i am not able to understand that completely. But what you are > saying that in the current scenario, when there is a broken data type based > index, then it will return no results, but never will return wrong results. > So never the update will corrupt the heap data. But i take

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> Wait a minute. Bingo So for unique checks we are already going to > index from Heap. So it is the same thing i am doing with Thick index. So if > we can trust our current unique checks, then we should trust the Thick > index. > > Thanks Tom!!! for having this good conversation > > I thi

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> No, what generally happens is it fails to find a matching index entry at > all, because the search algorithm concludes there can be no match based > on the limited set of comparisons it's done. Transitivity failures lead > to searching the wrong subset of the index. > Actually Tom, i am not abl

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Tom Lane
Gokulakannan Somasundaram writes: >> It does. The point is that the system is set up to limit the bad >> consequences. You might (will) get wrong query answers, but the >> heap data won't get corrupted. >> > Again Tom, if there is an update based on index scan, then it takes the > tupleid and u

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> It does. The point is that the system is set up to limit the bad >> consequences. You might (will) get wrong query answers, but the >> heap data won't get corrupted. >> >> > Tom, if this is our goal - *"can return wrong query answers, but should not corrupt the heap data.*" and if we

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Greg Stark
On Fri, Feb 26, 2010 at 6:30 PM, Gokulakannan Somasundaram wrote: > http://archives.postgresql.org/pgsql-hackers/2008-03/msg00682.php > I think, the buy-in became difficult because of the code quality. > Er, yeah. That's something we need to work on a bit. You should probably expect your first fe

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> It does. The point is that the system is set up to limit the bad > consequences. You might (will) get wrong query answers, but the > heap data won't get corrupted. > > Again Tom, if there is an update based on index scan, then it takes the tupleid and updates the wrong heap data right? The only

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Tom Lane
Gokulakannan Somasundaram writes: > But Tom, can you please explain me why that broken ordering example doesn't > affect the current index scans. It does. The point is that the system is set up to limit the bad consequences. You might (will) get wrong query answers, but the heap data won't get

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> IIRC, what was being talked about was shoehorning some hint bits into > the line pointers by assuming that size and offset are multiples of 4. > I'm not thrilled with having mutable status bits there for reliability > reasons, but it could be done without breaking a lot of existing code. > What I

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
Missed the group.. On Sat, Feb 27, 2010 at 12:00 AM, Gokulakannan Somasundaram < gokul...@gmail.com> wrote: > > I definitely think thick indexes were too ambitious of a target for a >> first time patch. Sequential index scans is very ambitious itself >> despite being significantly simpler (if you

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Tom Lane
Greg Stark writes: > On Fri, Feb 26, 2010 at 4:47 AM, Tom Lane wrote: >>> I feel the other one is easy. To store the hint bits inside the ItemId, in >>> the place of size. >> >> No, we're not going there. > Well we were already talking about moving the hint bits to someplace > else to enable CR

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Greg Stark
On Fri, Feb 26, 2010 at 2:38 PM, Gokulakannan Somasundaram wrote: >   I have never written much code in C, and even if write it, i am > sure i will receive the comment that it is a unmaintainable code.(eg: Thick > index code and trailing nulls code) I definitely think thick indexes were t

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> > > Much better to take on a "simple" project like enabling full > sequential index scans which you claimed you had a solution for and > which is in any case an important sub-problem for IOT. > > > Greg, Well i don't think i am ready to take up a project of this size. But at the same ti

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> > It definitely affects current indexes. We can't completely avoid bad user > functions. That is why it is important to put limits on how much damage they > can do. That's the motivation for the idea I mentioned before, of > double-checking visibility data in an IndexTuple before letting it survi

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Greg Stark
On Fri, Feb 26, 2010 at 4:47 AM, Tom Lane wrote: >> I feel the other one is easy. To store the hint bits inside the ItemId, in >> the place of size. > > No, we're not going there.  That breaks the fundamental page content > manipulation algorithms, and falls down for tuples not yet stored in a > p

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Karl Schnaitter
On Fri, Feb 26, 2010 at 12:36 AM, Gokulakannan Somasundaram < gokul...@gmail.com> wrote: > To be a bit more concrete: the typical sort of failure that you could >> get from broken btree operators is failure of transitivity, that is >> the comparators report A < B and B < C for some A, B, C, but do

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
> > To be a bit more concrete: the typical sort of failure that you could > get from broken btree operators is failure of transitivity, that is > the comparators report A < B and B < C for some A, B, C, but do not say > that A < C when those two values are compared directly. I don't see any > conv

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> > > Proving that a set of comparison operators are consistent just by > examining their runtime behavior is probably equivalent to solving the > halting problem. I can't see us doing it, or wanting to accept the > overhead of checking it even if it could be done. > The overhead of checking is v

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> No, we're not going there. That breaks the fundamental page content > manipulation algorithms, and falls down for tuples not yet stored in a > page (or being examined without a pointer to the page readily at hand), > and has no redeeming social value anyway compared to doing it in the > proven f

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Tom Lane
Gokulakannan Somasundaram writes: > In the Function based indexes on those functions, which we are > suspecting to be a volatile one Or in the datatypes, which we suspect to be > broken, can we have additional checks to ensure that to ensure that this > does not happen? I mean, do you think,

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Tom Lane
Gokulakannan Somasundaram writes: >> Actually, if you need to squeeze a few more bits into that word, the >> thing to do would be to get rid of storing the tuple length there. >> This would involve adding the same type of indirection header that >> we use for HeapTuples, so that the length would b

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
On Fri, Feb 26, 2010 at 9:54 AM, Gokulakannan Somasundaram < gokul...@gmail.com> wrote: > > No, it's far from harmless. As soon as that heap TID gets filled with >> an unrelated tuple, you run the risk of indexscans alighting on and >> perhaps modifying the wrong tuple. >> >> > Tom, i think

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> No, it's far from harmless. As soon as that heap TID gets filled with > an unrelated tuple, you run the risk of indexscans alighting on and > perhaps modifying the wrong tuple. > > Tom, In the Function based indexes on those functions, which we are suspecting to be a volatile one Or in the

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> >> > First of all, volatility is not the only issue. The ordering ops could also > be incorrect, e.g., violate the transitivity property. there is no reliable > way to determine if a function is volatile and/or incorrectly specified. > No it is the only issue. If you create a datatype with volat

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
Tom, Actually, if you need to squeeze a few more bits into that word, the > thing to do would be to get rid of storing the tuple length there. > This would involve adding the same type of indirection header that > we use for HeapTuples, so that the length would be available at need > without going

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Tom Lane
Karl Schnaitter writes: > Of course, PG can't "support" indexing with incorrect functions. However, > it's worthwhile to guard against too much damage being done if the user's > function has a bug. Maybe I'm wrong? Maybe an index tuple with a dangling > pointer is actually harmless? No, it's far

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Tom Lane
Karl Schnaitter writes: > If it's of any interest, I can say something about the hint bits in the > index tuple header. In my implementation, my decision was to use only one > hint bit. It went into the unused 13th bit of the IndexTuple header. When > the hint bit is set, it means that > (xmin is

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Karl Schnaitter
On Thu, Feb 25, 2010 at 3:59 PM, Gokulakannan Somasundaram < gokul...@gmail.com> wrote: > I disagree with that, Gokul -- if the ordering operators are volatile or >> just incorrect, during DELETE, you could set xmax in the wrong IndexTuple. >> Then there will be another IndexTuple that says it's v

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> > I disagree with that, Gokul -- if the ordering operators are volatile or > just incorrect, during DELETE, you could set xmax in the wrong IndexTuple. > Then there will be another IndexTuple that says it's visible, but it points > to a non-visible heap tuple. I think you should follow the pointe

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Karl Schnaitter
If it's of any interest, I can say something about the hint bits in the index tuple header. In my implementation, my decision was to use only one hint bit. It went into the unused 13th bit of the IndexTuple header. When the hint bit is set, it means that (xmin is committed OR xmin = InvalidTransac

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Tom Lane
Greg Stark writes: > 2) Now that we don't have vacuum full the command-id is kind of a > waste. Not really. > We could replace it with some kind of local memory data > structure which is capable of spilling to disk. The performance costs of that would probably outweigh any space savings. > I t

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Greg Stark
On Thu, Feb 25, 2010 at 9:25 PM, Tom Lane wrote: >  We've sweated > blood to get that struct down to where it is; there's no way to make it > smaller without giving up some really fundamental things, for example > the ability to do UPDATE :-( Oh, this is a tangent but I think there are some more

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Karl Schnaitter
> The other >> problem is the extra write load created by needing to update the index's >> copies of the hint bits; not to mention extra writes to freeze the xids >> when they get old enough. >> > But Tom, i remember that the vacuum was faster when index had visibility > info, since we need not tou

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> Wait a second, which idea are we currently talking about? No heap at > all, or just the ability to check visibility without visiting the heap? > I was talking about the indexes with snapshot > > If it's a genuine IOT (ie no separate heap), then you are not going to > be able to get away witho

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Tom Lane
Greg Stark writes: > In indexes we currently get away with a reduced header which has few > of the 6 bytes of info bits. However the only reason we can do is > because we impose arbitrary limitations that work for indexes but > wouldn't be reasonable for tables. Such as a lower maximum number of >

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Greg Stark
On Thu, Feb 25, 2010 at 8:09 PM, Gokulakannan Somasundaram wrote: >   I think, somewhere things have been misunderstood. we only need 8 > bytes more per index entry. I thought Postgres has a 8 byte transaction id, > but it is only 4 bytes, so we only need to save the insertion and deletion

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Tom Lane
Gokulakannan Somasundaram writes: > I think, somewhere things have been misunderstood. we only need 8 > bytes more per index entry. I thought Postgres has a 8 byte transaction id, > but it is only 4 bytes, so we only need to save the insertion and deletion > xids. So 8 bytes more per tup

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> 1) transaction information in index > >This seems like a lot of bloat in indexes. It also means breaking > a lot of other optimizations such as being able to read the tuples > directly from the heap page without locking. I'm not sure how much > those are worth though. But adding 24 bytes to e

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote: > Say a checkpoint has occured in between and flushed the dirty pages into > disk, while the updater waits to update the visibility map. Now there will > be no replay for the insert/update/delete right? Yeah, good catch, that could happen. -- Heikki Linnakangas

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote: >> The replay of the heap insert/update/delete record updates the >> visibility map. >> > So are you planning to make that section, which writes the xlog and updates > the visibility map inside a PANIC section right? The xlog record is already written in a critical

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote: > OK. Say a session doing the update, which is the fist update on the page, > resets the PD_ALL_VISIBLE and just before updating the visibility map > crashes. The subsequent inserts/updates/deletes, will see the PD_ALL_VISIBLE > flag cleared and never care to update

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> The replay of the heap insert/update/delete record updates the > visibility map. > > Say a checkpoint has occured in between and flushed the dirty pages into disk, while the updater waits to update the visibility map. Now there will be no replay for the insert/update/delete right?

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> The replay of the heap insert/update/delete record updates the > visibility map. > > So are you planning to make that section, which writes the xlog and updates the visibility map inside a PANIC section right?

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
> Yes. When a bit is cleared, that's OK, because a cleared bit just means > "you need to check visibility in the heap tuple". When a bit is set, > however, it's important that it doesn't hit the disk before the > corresponding heap page update. That's why visibilitymap_set() sets the > LSN on the p

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote: > a) The current WAL architecture makes sure that the WAL Log is written > before the actual page flush( i believe ). But you are changing that > architecture for Visibility maps. Visibility map might get flushed out > before the corresponding WAL gets written. Ye

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
On Thu, Feb 25, 2010 at 3:16 AM, Gokulakannan Somasundaram < gokul...@gmail.com> wrote: > > I haven't thought about whether this is sufficient but if it is then >> an initial useful thing to add would be to use it for queries where we >> have a qual that can be checked using the index key even tho

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> The WAL record of the heap insert/update/delete contains a flag > indicating that the visibility map needs to be updated too. Thus no need > for a separate WAL record. > > Heikki, Have you considered these cases? a) The current WAL architecture makes sure that the WAL Log is written befor

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> I haven't thought about whether this is sufficient but if it is then > an initial useful thing to add would be to use it for queries where we > have a qual that can be checked using the index key even though we > can't do a range scan. For example if you have a btree index on > and you have a WH

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> You also need to avoid scanning the same tuple twice. That's not a > problem for VACUUM, but it is for full index scans. > > Heikki, Actually the logic, which i have explained earlier is to avoid scanning tuples twice. Gokul.

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote: > OK I think, i can think of a solution to achieve fast full index scan like > oracle. > a) Issue ids to every block that gets created inside the index. we are > already doing that > b) Now before the fast full index scan starts, note down the max id that got > issu

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Greg Stark
On Wed, Feb 24, 2010 at 8:04 PM, Gokulakannan Somasundaram wrote: > OK I think, i can think of a solution to achieve fast full index scan like > oracle. > a) Issue ids to every block that gets created inside the index. we are > already doing that > b) Now before the fast full index scan starts, no

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> >> That doesn't work because when you split an index page any sequential >>> scan in progress will either see the same tuples twice or will miss >>> some tuples depending on where the new page is allocated. Vacuum has a >>> clever trick for solving this but it doesn't work for arbitrarily many >>

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> It might be slightly easier given the assumption that you only want to > scan leaf tuples. > > But there's an additional problem I didn't think of before. Currently > we optimize index scans by copying all relevant tuples to local memory > so we don't need to hold an index lock for an extended ti

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote: > Hmmm So whenever the update transaction changes a page, it will update > the visibility map, but won't enter the WAL record. > So after the crash we have a visibility map, which has false positives. The WAL record of the heap insert/update/delete contains

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
Missed the group... On Thu, Feb 25, 2010 at 12:34 AM, Gokulakannan Somasundaram < gokul...@gmail.com> wrote: > > > On Thu, Feb 25, 2010 at 12:28 AM, Gokulakannan Somasundaram < > gokul...@gmail.com> wrote: > >> >> That doesn't work because when you split an index page any sequential >>> scan in p

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> Yes. The visibility map doesn't need any new WAL records to be written. > > We probably will need to add some WAL logging to close the holes with > crash recovery, required for relying on it for index-only-scans, but > AFAICS only for VACUUM and probably only one WAL record for a whole > bunch of

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
On Wed, Feb 24, 2010 at 10:09 PM, Tom Lane wrote: > "Kevin Grittner" writes: > > So you are essentially proposing that rather than moving the heap > > data into the leaf tuples of the index in the index file, you will > > move the leaf index data into the heap tuples? The pages in such a > > IO

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Greg Stark
On Wed, Feb 24, 2010 at 5:46 PM, Tom Lane wrote: > "Kevin Grittner" writes: >> Greg Stark wrote: >>> That doesn't work because when you split an index page any >>> sequential scan in progress will either see the same tuples twice >>> or will miss some tuples depending on where the new page is >>

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Tom Lane
"Kevin Grittner" writes: > Greg Stark wrote: >> That doesn't work because when you split an index page any >> sequential scan in progress will either see the same tuples twice >> or will miss some tuples depending on where the new page is >> allocated. Vacuum has a clever trick for solving this b

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Simon Riggs
On Wed, 2010-02-24 at 15:52 +, Greg Stark wrote: > We can do > better with stuff like Heikki's "grouped index tuple" and the > visibility map which don't interfere with other features as much. Yes, much better plan. More practical, nearly there. -- Simon Riggs www.2ndQuadrant.com

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Kevin Grittner
Greg Stark wrote: > Gokulakannan Somasundaram wrote: >> scan through the leaf pages. > > That doesn't work because when you split an index page any > sequential scan in progress will either see the same tuples twice > or will miss some tuples depending on where the new page is > allocated. Vac

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Greg Stark
On Wed, Feb 24, 2010 at 4:12 PM, Gokulakannan Somasundaram wrote: > Sequential scans can be done on IOTs, just scan through the leaf pages. That doesn't work because when you split an index page any sequential scan in progress will either see the same tuples twice or will miss some tuples dependi

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Heikki Linnakangas
Kevin Grittner wrote: > With the "simplifying" technique of keeping the leaf level in a > separate file, it becomes hard to distinguish from Heikki's Grouped > Index Tuples approach when you include the "maintain cluster order" > patch. That really looks like where anyone should work from for any

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Kevin Grittner
Tom Lane wrote: > Isn't that just a variant on Heikki's "grouped index tuples" idea? With apologies to Heikki for having forgotten that effort, yes. With the "simplifying" technique of keeping the leaf level in a separate file, it becomes hard to distinguish from Heikki's Grouped Index Tuple

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote: >> If you have a scenario where the visibility map incurs a measurable >> overhead, let's hear it. I didn't see any in the tests I performed, but >> it's certainly possible that if the circumstances are just right it >> makes a difference. >> >> Heikki, >

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Tom Lane
"Kevin Grittner" writes: > So you are essentially proposing that rather than moving the heap > data into the leaf tuples of the index in the index file, you will > move the leaf index data into the heap tuples? The pages in such a > IOT heap file would still need to look a lot like index pages, y

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Kevin Grittner
Gokulakannan Somasundaram wrote: >> With an IOT I don't understand how you get out of index >> corruption without data loss. That's a showstopper for practical >> use, I think. > > For simplicity, say we are storing all the non-leaf pages of the > index in a seperate file, then the leaf pages

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Robert Haas
On Wed, Feb 24, 2010 at 11:05 AM, Gokulakannan Somasundaram wrote: >> I think you're a barking up the wrong tree.  AFAIUI, the need for the >> visibility map has not very much to do with whether the table has >> indices, and everything to do with avoiding unnecessary VACUUMs.  In >> any event, you

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> > > With an IOT I don't understand how you get out of index corruption > without data loss. That's a showstopper for practical use, I think. > For simplicity, say we are storing all the non-leaf pages of the index in a seperate file, then the leaf pages are nothing but the table. So if we can r

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> > But adding 24 bytes to every index entry seems > pretty unlikely to be a win anyways. > We actually wanted to make it optional. Not every index will be like that. More than that we can make it into 16 bytes. Only commands within the same transaction will not be able to do a index only scan.

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> > > I think you're a barking up the wrong tree. AFAIUI, the need for the > visibility map has not very much to do with whether the table has > indices, and everything to do with avoiding unnecessary VACUUMs. In > any event, you've not shown that the visibility map HAS any overhead, > so talking

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Greg Stark
On Wed, Feb 24, 2010 at 3:12 PM, Tom Lane wrote: > With an IOT I don't understand how you get out of index corruption > without data loss.  That's a showstopper for practical use, I think. That doesn't seem insurmountable to me. You could always allow a REINDEX to scan the index sequentially, ign

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Kevin Grittner
Tom Lane wrote: > The fundamental point IMHO is that indexes are more complex and > much more fragile than heaps. This is obviously true > theoretically and we have years of experience that proves it to be > true in the field as well. Broken comparison functions are just > one of the possible

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Tom Lane
Karl Schnaitter writes: > On Wed, Feb 24, 2010 at 12:53 AM, Gokulakannan Somasundaram < > gokul...@gmail.com> wrote: >> Again not to deviate from my initial question, can we make a decision >> regarding unstable/mutable functions / broken data types ? >> > I second this question. A year or two ag

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Robert Haas
On Wed, Feb 24, 2010 at 9:41 AM, Gokulakannan Somasundaram wrote: >> >> If you have a scenario where the visibility map incurs a measurable >> overhead, let's hear it. I didn't see any in the tests I performed, but >> it's certainly possible that if the circumstances are just right it >> makes a d

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> > > If you have a scenario where the visibility map incurs a measurable > overhead, let's hear it. I didn't see any in the tests I performed, but > it's certainly possible that if the circumstances are just right it > makes a difference. > > Heikki, The obvious one, i could observe is t

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
Forgot to include the group.. On Wed, Feb 24, 2010 at 5:38 PM, Gokulakannan Somasundaram < gokul...@gmail.com> wrote: > I am not familiar with this term "broken data types", and I just looked for >> it in the source code and couldn't find it. >> >> What exactly are you referring to? >> >> cheers

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Heikki Linnakangas
Gokulakannan Somasundaram wrote: > While we accept that visibility map is good for read only application, why > can't we make it optional? Atleast if there is a way for a person to drop > the visibility map for a table(if it gets created by default), the > application need not incur the overhead fo

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Karl Schnaitter
On Wed, Feb 24, 2010 at 12:53 AM, Gokulakannan Somasundaram < gokul...@gmail.com> wrote: > > Again not to deviate from my initial question, can we make a decision > regarding unstable/mutable functions / broken data types ? > > I second this question. A year or two ago, Gokul and I both proposed a

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Simon Riggs
On Wed, 2010-02-24 at 14:23 +0530, Gokulakannan Somasundaram wrote: > can we make a decision regarding unstable/mutable functions / broken > data types ? You need to take about 5 steps back. Diving straight into a particular technical detail is not the right approach. Nobody will confirm a decisi

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> > > The fragility there is not an issue in a mostly read-only application, > but it definitely would be a concern in other cases. > While we accept that visibility map is good for read only application, why can't we make it optional? Atleast if there is a way for a person to drop the visibility

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Simon Riggs
On Wed, 2010-02-24 at 10:40 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > I would add that both Heikki and Greg Stark have argued at length that > > the visibility map cannot be relied upon in production systems. > > It cannot be relied on *in its current form*. There's a hole in crash

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Heikki Linnakangas
Simon Riggs wrote: > I would add that both Heikki and Greg Stark have argued at length that > the visibility map cannot be relied upon in production systems. It cannot be relied on *in its current form*. There's a hole in crash recovery where it can be left in an inconsistent state. That obviously

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Simon Riggs
On Wed, 2010-02-24 at 13:50 +0530, Gokulakannan Somasundaram wrote: > Please consider my following statements from a database tuner > perspective. I don't want to discourage the visibility map feature, > but it has the disadvantages, which we already discussed. While i do a > explain analyze and i

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
> > > I think Gokul was asking because he wanted to work on it, but wanted to > check community approval first. > Yes the problem is that we need to come to a consensus on broken data types. As Heikki pointed out, those data types, which is based on a unstable function like time, date, random etc.

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Simon Riggs
On Tue, 2010-02-23 at 17:08 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Mon, 2010-02-22 at 08:51 +0200, Heikki Linnakangas wrote: > >> Gokulakannan Somasundaram wrote: > > > >>> May i get a little clarification on this issue? Will we be supporting > >>> the IOT feature in post

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Kevin Grittner
Simon Riggs wrote: > On Mon, 2010-02-22 at 08:51 +0200, Heikki Linnakangas wrote: >> Gokulakannan Somasundaram wrote: > >> > May i get a little clarification on this issue? Will we be >> > supporting the IOT feature in postgres in future? >> >> What seems like the best path to achieve the kind o

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Heikki Linnakangas
Andrew Dunstan wrote: > Gokulakannan Somasundaram wrote: >> I looked at the postgres nbtree code. From my analysis(which >> might be wrong!), we can implement IOTs, provided we make a decision >> on broken data types issue. > > I am not familiar with this term "broken data types", and I j

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Heikki Linnakangas
Simon Riggs wrote: > On Mon, 2010-02-22 at 08:51 +0200, Heikki Linnakangas wrote: >> Gokulakannan Somasundaram wrote: > >>> May i get a little clarification on this issue? Will we be supporting >>> the IOT feature in postgres in future? >> What seems like the best path to achieve the kind of p

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Andrew Dunstan
Gokulakannan Somasundaram wrote: I looked at the postgres nbtree code. From my analysis(which might be wrong!), we can implement IOTs, provided we make a decision on broken data types issue. I am not familiar with this term "broken data types", and I just looked for it in the

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Simon Riggs
On Mon, 2010-02-22 at 08:51 +0200, Heikki Linnakangas wrote: > Gokulakannan Somasundaram wrote: > > May i get a little clarification on this issue? Will we be supporting > > the IOT feature in postgres in future? > > What seems like the best path to achieve the kind of performance > benefits

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Csaba Nagy
Hi all, On Mon, 2010-02-22 at 10:29 +, Greg Stark wrote: > On Mon, Feb 22, 2010 at 8:18 AM, Gokulakannan Somasundaram > wrote: > > a) IOT has both table and index in one structure. So no duplication of data > > b) With visibility maps, we have three structures a) Table b) Index c) > > Visibil

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Gokulakannan Somasundaram
On Tue, Feb 23, 2010 at 10:42 AM, Tom Lane wrote: > Takahiro Itagaki writes: > > Instead, how about excluding columns in primary keys from table data? > > How will you implement "select * from mytable" ? Or even > "select * from mytable where non_primary_key = something" ? > If you can't do eit

  1   2   >