Hello, Yes, another person looking to contribute to and develop Hadoop. I'm looking to start off small, fixing a few bugs before moving into larger stuff.
First, a bit of background: Years ago I had the idea of creating a semi-decentralized distributed file system. The idea came when I was working for a small/medium sized company who was looking for a simple backup solution for their workstations. PC's back then came with 100+ GB hard drives but, as simple workstations, employees were using less than half that space. Why not have each workstation backup to a few other workstations, duplicating files across multiple machines for redundancy. RAID for the network. I started coming up with design and architecture specs, protocol examples and even started writing a bit of the system (in Java). I tried to find a few interested developers but everyone seemed to think the task was much too large to be accomplished as a side project (and I didn't think, given the IT industry of the time, that anyone would fund it). Later, I realized such a distributed system could be much more than a simple file backup solution. It looks like Hadoop and HDFS are creating a lot of what I had wanted to create, it's already surpassed what I had in mind in most ways. So, where should I start? Just start fixing bugs listed in JIRA? Geoff