Re: Sailfish

2012-05-11 Thread Robert Evans
super familiar with sailfish, but from what I remember from a while ago it is the modified version of KFS that is in reality doing the sorting. The maps will output data to "chunks" aka blocks that when each chunk is full it is sorted. When the sorting is finished for a chunk the re

Re: Sailfish

2012-05-10 Thread Todd Lipcon
Hey Sriram, We discussed this before, but for the benefit of the wider audience: :) It seems like the requirements imposed on KFS by Sailfish are in most ways much simplier than the requirements of a full distributed filesystem. The one thing we need is atomic record append -- but we don't

Re: Sailfish

2012-05-10 Thread Sriram Rao
Srivas, Sailfish is builds upon record append (a feature not present in HDFS). The software that is currently released is based on Hadoop-0.20.2. You use the Sailfish version of Hadoop-0.20.2, KFS for the intermediate data, and then HDFS (or KFS) for storing the job/input. Since the changes

Re: Sailfish

2012-05-10 Thread M. C. Srivas
Sriram, Sailfish depends on append. I just noticed the HDFS disabled append. How does one use this with Hadoop? On Wed, May 9, 2012 at 9:00 AM, Otis Gospodnetic wrote: > Hi Sriram, > > >> The I-file concept could possibly be implemented here in a fairly self > contained w

Re: Sailfish

2012-05-09 Thread Otis Gospodnetic
d to seeing this! :) Otis -- Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm  > > From: Sriram Rao >To: common-dev@hadoop.apache.org >Sent: Tuesday, May 8, 2012 6:48 PM >Subject: Re: Sailfish > >Dear Andy, &g

Re: sailfish

2012-05-08 Thread Sriram Rao
ts of things. Sriram On May 8, 2012, at 10:32 AM, Sriram Rao wrote: > Hi, > > I'd like to announce the release of a new open source project, Sailfish. > > http://code.google.com/p/sailfish/ > > Sailfish tries to improve Hadoop-performance, particularly for large-jobs > whi

Re: Sailfish

2012-05-08 Thread Sriram Rao
How do you propose I, practically, migrate to something like Sailfish > without a major capital expenditure and/or downtime and/or data loss? Well, we are not asking for KFS to replace HDFS. One path you could take is to experiment with Sailfish---use KFS just for the intermediate data and HDFS fo

Re: Project announcement: Sailfish (also, looking for colloborators)

2012-05-08 Thread Eric Baldeschwieler
May 8, 2012, at 10:32 AM, Sriram Rao wrote: > Hi, > > I'd like to announce the release of a new open source project, Sailfish. > > http://code.google.com/p/sailfish/ > > Sailfish tries to improve Hadoop-performance, particularly for large-jobs > which process TB

Re: Project announcement: Sailfish (also, looking for colloborators)

2012-05-08 Thread Andrew Purtell
MapReduce workflow. How do you propose I, practically, migrate to something like Sailfish without a major capital expenditure and/or downtime and/or data loss? However, can the Sailfish I-files implementation be plugged in as an alternate Shuffle implementation in MRv2 (see MAPREDUCE-3060 and MAPR

RE: Project announcement: Sailfish (also, looking for colloborators)

2012-05-08 Thread Saikat Kanjilal
Me 2, let me know as well. > Date: Tue, 8 May 2012 10:35:34 -0700 > From: mrra...@yahoo.com > Subject: Re: Project announcement: Sailfish (also, looking for colloborators) > To: common-dev@hadoop.apache.org > > Hi Sriram, > >I'm interested in getting

Re: Project announcement: Sailfish (also, looking for colloborators)

2012-05-08 Thread Anil Gurnani
gt; From: Sriram Rao > To: common-dev@hadoop.apache.org > Sent: Tuesday, May 8, 2012 10:32 AM > Subject: Project announcement: Sailfish (also, looking for colloborators) > > Hi, > > I'd like to announce the release of a new open source project, Sailfish. > > ht

Re: Project announcement: Sailfish (also, looking for colloborators)

2012-05-08 Thread Raghu Sakleshpur
Hi Sriram,        I'm interested in getting involved. Let me know in what capacity I can get involved.. Thanks, Raghu From: Sriram Rao To: common-dev@hadoop.apache.org Sent: Tuesday, May 8, 2012 10:32 AM Subject: Project announcement: Sailfish

Project announcement: Sailfish (also, looking for colloborators)

2012-05-08 Thread Sriram Rao
Hi, I'd like to announce the release of a new open source project, Sailfish. http://code.google.com/p/sailfish/ Sailfish tries to improve Hadoop-performance, particularly for large-jobs which process TB's of data and run for hours. In building Sailfish, we modify how map-output is h