Hi all, Earlier I posted this in the wrong thread. Please excuse the double posting.
Tan Tran Begin forwarded message: > From: Tan Tran <tankimt...@gmail.com> > Subject: Re: [HACKERS] GSoC 2014 - mentors, students and admins > Date: March 2, 2014 at 5:03:14 AM PST > To: Greg Stark <st...@mit.edu> > Cc: pgsql-advocacy <pgsql-advoc...@postgresql.org>, PostgreSQL-development > <pgsql-hackers@postgresql.org> > > Hi Greg, pgsql-advocacy, and pgsql-hackers, > > I'm interested in doing my GSoC project on this idea. I'm new to indexing and > WAL, which I haven't encountered in my classes, but it sounds interesting and > valuable to Postgresql. So here's my draft proposal. Do you mind giving your > opinion and corrections? With your help I'll add some technical detail to my > plans. > > Thanks, > Tan Tran > > Introduction > In write-ahead logging (WAL), all modifications to a database are > written to a write-ahead log before being flushed to disk at periodic > checkpoints. This method saves I/O operations, enables a continuous backup, > and, in the case of database failure, guarantees data integrity up until the > last saved checkpoint. In Postgresql’s implementation, transactions are > written to XLog, which is divided into 16MB files (“segments”) that together > comprise a complete history of transactions. Transactions are continually > appended to the latest segment, while checkpointing continually archives > segments up until the last checkpoint. Internally, a suite of XLog structures > and functions interfaces with the various resource managers so they can log a > sufficient amount of data to restore data (“redo”) in case of failure. > Another Postgresql feature is the creation of indexes on a invariant > custom field; for example, on the LastName of a Person even though the > primary key is ID. These custom indexes speed up row lookup. Postgres > currently supports four index types: B-tree, GiST, and GIN, and hash. Indexes > on the former three are WAL-recoverable, but hashing is not. > > 2. Proposal > As a GSoC student, I will implement WAL recovery of hash indexes using > the other index types’ WAL code as a guide. Roughly, I will: > - Devise a way to store and retrieve hashing data within the XLog data > structures. > - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) > in hash.c, branch to code for the various redo operations: creating an index, > inserting into an index, deleting an index, and page operations (split, > delete, update?). > - Code each branch by drawing on examples from btree_redo, gin_redo, and > gist_redo, the existing XLog code of the other index types. > > Benefits > Hash index searching is O(1), which is asymptotically faster than the O(n lg > n) searching of a B-tree, and does not require custom indexing functions like > GIN and GIST inherently do. Therefore it is desirable for rows that will only > be retrieved on an equality or inequality relation. However, two things > currently stand in the way of its popular use. From the Postgresql > documentation, > “Hash index operations are not presently WAL-logged, so hash indexes > might need to be rebuilt with REINDEX after a database crash if there were > unwritten changes. Also, changes to hash indexes are not replicated over > streaming or file-based replication after the initial base backup, so they > give wrong answers to queries that subsequently use them. For these reasons, > hash index use is presently discouraged.” > My project would solve the first problem, after which I would like to stay on > and fix the second. > > To be written: Quantifiable Results, Schedule, Completeness Criteria, Bio > > > On Feb 28, 2014, at 6:21 AM, Greg Stark <st...@mit.edu> wrote: > >> On Tue, Jan 28, 2014 at 5:34 PM, Thom Brown <t...@linux.com> wrote: >>> Who would be up for mentoring this year? And are there any project >>> ideas folk would like to suggest? >> >> I mentored in the past and felt I didn't do a very good job because I >> didn't really understand the project the student was working on. >> >> There's precisely one project that I feel I would be competent to >> mentor at this point. Making hash indexes WAL recoverable. This is >> something that's easy to define the scope of and easy to determine if >> the student is on track and easy to measure when finished. It's >> something where as far as I can tell all the mentor work will be >> purely technical advice. >> >> Also it's something the project really really needs and is perfectly >> sized for a GSOC project IMHO. Also it's a great project for a student >> who might be interested in working on Postgres in the future since it >> requires learning all our idiosyncratic build and source conventions >> but doesn't require huge or controversial architectural changes. >> >> I fear a number of items in the Wiki seem unrealistically large >> projects for GSOC IMNSHO. >> >> -- >> greg >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >