[jira] [Created] (HDFS-3123) [Backup NameNode]BNN is getting Nullpointer execption and shuttingdown When NameNode got formatted

2012-03-21 Thread Brahma Reddy Battula (Created) (JIRA)
[Backup NameNode]BNN is getting Nullpointer execption and shuttingdown When NameNode got formatted --- Key: HDFS-3123 URL: https://issues.apache.org/jira/browse/HDFS-

[jira] [Created] (HDFS-3124) have find command in FsShell

2012-03-21 Thread Dheeraj Kapur (Created) (JIRA)
have find command in FsShell Key: HDFS-3124 URL: https://issues.apache.org/jira/browse/HDFS-3124 Project: Hadoop HDFS Issue Type: New Feature Components: name-node Environment: linux

[jira] [Resolved] (HDFS-3124) have find command in FsShell

2012-03-21 Thread Harsh J (Resolved) (JIRA)
[ https://issues.apache.org/jira/browse/HDFS-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved HDFS-3124. --- Resolution: Duplicate Lets carry discussion ahead on https://issues.apache.org/jira/browse/HDFS-227

RE: [DISCUSS] Remove append?

2012-03-21 Thread Dave Shine
I am not a contributor to this project, so I don't know how much weight my opinion carries. But I have been hoping to see append become stable soon. We are constantly dealing with the "small file problem", and I have written M/R jobs to periodically roll up lots of small files into a few small

Jenkins build is unstable: Hadoop-Hdfs-trunk #991

2012-03-21 Thread Apache Jenkins Server
See

Hadoop-Hdfs-trunk - Build # 991 - Unstable

2012-03-21 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/991/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 12604 lines...] [INFO] xmlOutput is false [INFO]

Re: [DISCUSS] Remove append?

2012-03-21 Thread Milind.Bhandarkar
As someone who has worked with hdfs-compatible distributed file systems that support append, I can vouch for its extensive usage. I have seen how simple it becomes to create tar archives, and later append files to them, without writing special inefficient code to do so. I have seen it used in arc

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
Thanks for the feedback Milind, questions inline. On Wed, Mar 21, 2012 at 10:17 AM, wrote: > As someone who has worked with hdfs-compatible distributed file systems > that support append, I can vouch for its extensive usage. > > I have seen how simple it becomes to create tar archives, and later

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
On Wed, Mar 21, 2012 at 10:32 AM, Eli Collins wrote: > Thanks for the feedback Milind, questions inline. > > On Wed, Mar 21, 2012 at 10:17 AM,   wrote: >> As someone who has worked with hdfs-compatible distributed file systems >> that support append, I can vouch for its extensive usage. >> >> I ha

Re: [DISCUSS] Remove append?

2012-03-21 Thread Milind.Bhandarkar
Answers inline. On 3/21/12 10:32 AM, "Eli Collins" wrote: > >Why not just write new files and use Har files, because Har files are a >pita? Yes, and har creation is an MR job, which is totally I/O bound, and yet takes up slots/containers, reducing cluster utilization. >Can you elaborate on the

[jira] [Created] (HDFS-3125) Add a service that enable JournalDaemon

2012-03-21 Thread Suresh Srinivas (Created) (JIRA)
Add a service that enable JournalDaemon --- Key: HDFS-3125 URL: https://issues.apache.org/jira/browse/HDFS-3125 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Suresh Srinivas In this sub

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
On Wed, Mar 21, 2012 at 10:47 AM, wrote: > Answers inline. > > On 3/21/12 10:32 AM, "Eli Collins" wrote: > >> >>Why not just write new files and use Har files, because Har files are a >>pita? > > Yes, and har creation is an MR job, which is totally I/O bound, and yet > takes up slots/containers,

RE: [DISCUSS] Remove append?

2012-03-21 Thread Tim Broberg
No specific advice on this particular issue, but in general, I learned the hard way to stop asking the question, "Feature X is hard to support, is anybody really going to use this?" *Every time* I have asked this question, I get the answer I want to hear. *Every time*, they come back and ask for

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
Good point. I thought I'd start with devs first. If you can't get it past devs there's no reason to go further. Also, users will tell you they want everything. I'd like to root cause this, eg if they want append to solve the small files problem I'd like to know if solving the latter means we don't

RE: [DISCUSS] Remove append?

2012-03-21 Thread Dave Shine
I never brought it up on the CDH list because I was told during my CDH training (Dec 2010) that is was already there. When I later learned it was usable only for HBase, I just assumed it would be coming, eventually. Dave -Original Message- From: Eli Collins [mailto:e...@cloudera.com]

Re: [DISCUSS] Remove append?

2012-03-21 Thread Milind.Bhandarkar
>1. If the daily files are smaller than 1 block (seems unlikely) Even at a large hdfs installation, the avg file size was < 1.5 blocks. Bucketing causes the file sizes to drop. >2. The small files problem (a typical NN can store 100-200M files, so >a problem for big users) Big users probably ha

Re: [DISCUSS] Remove append?

2012-03-21 Thread Milind.Bhandarkar
Eli, To clarify a little bit, I think HDFS-3120 is the right thing to do, to disable appends, while still enabling hsync in branch-1. But, going forward, (say 0.23+) having appends working correctly will definitely add value, and make HDFS more palatable for lots of other workloads. Of course, I

Re: [DISCUSS] Remove append?

2012-03-21 Thread Milind.Bhandarkar
I would also like to point to work being done on PLFS-HDFS: http://institute.lanl.gov/isti/irhpit/presentations/PLFS-HDFS.pdf This would be made much simpler by allowing appends. Checkpointing in MPI is a very common use-case, and after Hamster, PLFS-HDFS becomes an attractive way to do this. (S

Re: [DISCUSS] Remove append?

2012-03-21 Thread Tsz Wo Sze
Some of the information in the email is not correct.  Let me clarify them.   > Where we are today.. append was added in the 0.17-19 releases > (HADOOP-1700) . . .   We never have append/sync in 0.17.  Sync was added to 0.18 but not append.  Append was added to 0.19.  By append/sync above, I mean

Re: [DISCUSS] Remove append?

2012-03-21 Thread Sanjay Radia
On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins wrote: > > > Append introduces non-trivial design and code complexity, which is not > worth the cost if we don't have real users. The bulk of the complexity of HDFS-265 ("the new Append") was around Hflush, concurrent readers, the pipeline etc. The co

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
On Wed, Mar 21, 2012 at 1:31 PM, Tsz Wo Sze wrote: > > Some of the information in the email is not correct.  Let me clarify them. > >> Where we are today.. append was added in the 0.17-19 > releases >> (HADOOP-1700) . . . > > We never have append/sync in 0.17.  Sync was added to 0.18 but not appen

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
On Wed, Mar 21, 2012 at 1:57 PM, Sanjay Radia wrote: > On Tue, Mar 20, 2012 at 5:37 PM, Eli Collins wrote: > >> >> >> Append introduces non-trivial design and code complexity, which is not >> worth the cost if we don't have real users. > > The bulk of the complexity of HDFS-265 ("the new Append")

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
On Wed, Mar 21, 2012 at 12:48 PM, wrote: > Eli, > > To clarify a little bit, I think HDFS-3120 is the right thing to do, to > disable appends, while still enabling hsync in branch-1. > > But, going forward, (say 0.23+) having appends working correctly will > definitely add value, and make HDFS mo

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
On Wed, Mar 21, 2012 at 12:30 PM, wrote: > >>1. If the daily files are smaller than 1 block (seems unlikely) > > Even at a large hdfs installation, the avg file size was < 1.5 blocks. > Bucketing causes the file sizes to drop. > >>2. The small files problem (a typical NN can store 100-200M files,

[jira] [Created] (HDFS-3126) Journal stream from the namenode to backup needs to have a timeout

2012-03-21 Thread Hari Mankude (Created) (JIRA)
Journal stream from the namenode to backup needs to have a timeout -- Key: HDFS-3126 URL: https://issues.apache.org/jira/browse/HDFS-3126 Project: Hadoop HDFS Issue Type: Sub-ta

Re: [DISCUSS] Remove append?

2012-03-21 Thread Milind.Bhandarkar
> >Absolutely, I'd like to learn more about what append/truncate buys us. Indeed. Lets postpone this discussion to Q2 then. Thanks, - milind --- Milind Bhandarkar Greenplum Labs, EMC (Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the vie

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
On Wed, Mar 21, 2012 at 3:06 PM, wrote: > >> >>Absolutely, I'd like to learn more about what append/truncate buys us. > > Indeed. Lets postpone this discussion to Q2 then. > I'd still like to hear what other people think if they haven't chimed in. Even if we decide to remove it, I don't think w

Re: [DISCUSS] Remove append?

2012-03-21 Thread Milind.Bhandarkar
Eli, If HDFS-3120 is committed to both 1.x and trunk/0.23.x, then one will be able to disable appends (while keeping hflush) using different config variables. By default (I.e. In hdfs-default.xlm), we should set dfs.support.append to false, and dfs.support.hsync to true. That way, we get enough t

Re: [DISCUSS] Remove append?

2012-03-21 Thread Eli Collins
On Wed, Mar 21, 2012 at 3:48 PM, wrote: > Eli, > > If HDFS-3120 is committed to both 1.x and trunk/0.23.x, then one will be > able to disable appends (while keeping hflush) using different config > variables. By default (I.e. In hdfs-default.xlm), we should set > dfs.support.append to false, and