Hi all,

Earlier I posted this in the wrong thread. Please excuse the double posting.

Tan Tran

Begin forwarded message:

> From: Tan Tran <tankimt...@gmail.com>
> Subject: Re: [HACKERS] GSoC 2014 - mentors, students and admins
> Date: March 2, 2014 at 5:03:14 AM PST
> To: Greg Stark <st...@mit.edu>
> Cc: pgsql-advocacy <pgsql-advoc...@postgresql.org>, PostgreSQL-development 
> <pgsql-hackers@postgresql.org>
> 
> Hi Greg, pgsql-advocacy, and pgsql-hackers,
> 
> I'm interested in doing my GSoC project on this idea. I'm new to indexing and 
> WAL, which I haven't encountered in my classes, but it sounds interesting and 
> valuable to Postgresql. So here's my draft proposal. Do you mind giving your 
> opinion and corrections? With your help I'll add some technical detail to my 
> plans.
> 
> Thanks,
> Tan Tran
> 
> Introduction
>       In write-ahead logging (WAL), all modifications to a database are 
> written to a write-ahead log before being flushed to disk at periodic 
> checkpoints. This method saves I/O operations, enables a continuous backup, 
> and, in the case of database failure, guarantees data integrity up until the 
> last saved checkpoint. In Postgresql’s implementation, transactions are 
> written to XLog, which is divided into 16MB files (“segments”) that together 
> comprise a complete history of transactions. Transactions are continually 
> appended to the latest segment, while checkpointing continually archives 
> segments up until the last checkpoint. Internally, a suite of XLog structures 
> and functions interfaces with the various resource managers so they can log a 
> sufficient amount of data to restore data (“redo”) in case of failure.
>       Another Postgresql feature is the creation of indexes on a invariant 
> custom field; for example, on the LastName of a Person even though the 
> primary key is ID. These custom indexes speed up row lookup. Postgres 
> currently supports four index types: B-tree, GiST, and GIN, and hash. Indexes 
> on the former three are WAL-recoverable, but hashing is not.
> 
> 2. Proposal
>       As a GSoC student, I will implement WAL recovery of hash indexes using 
> the other index types’ WAL code as a guide. Roughly, I will:
> - Devise a way to store and retrieve hashing data within the XLog data 
> structures. 
> - In the existing skeleton for hash_redo(XLogRecPtr lsn, XLogRecord *record) 
> in hash.c, branch to code for the various redo operations: creating an index, 
> inserting into an index, deleting an index, and page operations (split, 
> delete, update?).
> - Code each branch by drawing on examples from btree_redo, gin_redo, and 
> gist_redo, the existing XLog code of the other index types.
> 
> Benefits
> Hash index searching is O(1), which is asymptotically faster than the O(n lg 
> n) searching of a B-tree, and does not require custom indexing functions like 
> GIN and GIST inherently do. Therefore it is desirable for rows that will only 
> be retrieved on an equality or inequality relation. However, two things 
> currently stand in the way of its popular use. From the Postgresql 
> documentation,
>       “Hash index operations are not presently WAL-logged, so hash indexes 
> might need to be rebuilt with REINDEX after a database crash if there were 
> unwritten changes. Also, changes to hash indexes are not replicated over 
> streaming or file-based replication after the initial base backup, so they 
> give wrong answers to queries that subsequently use them. For these reasons, 
> hash index use is presently discouraged.”
> My project would solve the first problem, after which I would like to stay on 
> and fix the second.
> 
> To be written: Quantifiable Results, Schedule, Completeness Criteria, Bio
> 
> 
> On Feb 28, 2014, at 6:21 AM, Greg Stark <st...@mit.edu> wrote:
> 
>> On Tue, Jan 28, 2014 at 5:34 PM, Thom Brown <t...@linux.com> wrote:
>>> Who would be up for mentoring this year?  And are there any project
>>> ideas folk would like to suggest?
>> 
>> I mentored in the past and felt I didn't do a very good job because I
>> didn't really understand the project the student was working on.
>> 
>> There's precisely one project that I feel I would be competent to
>> mentor at this point. Making hash indexes WAL recoverable. This is
>> something that's easy to define the scope of and easy to determine if
>> the student is on track and easy to measure when finished. It's
>> something where as far as I can tell all the mentor work will be
>> purely technical advice.
>> 
>> Also it's something the project really really needs and is perfectly
>> sized for a GSOC project IMHO. Also it's a great project for a student
>> who might be interested in working on Postgres in the future since it
>> requires learning all our idiosyncratic build and source conventions
>> but doesn't require huge or controversial architectural changes.
>> 
>> I fear a number of items in the Wiki seem unrealistically large
>> projects for GSOC IMNSHO.
>> 
>> -- 
>> greg
>> 
>> 
>> -- 
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
> 

Reply via email to