Re: Developing cross-component patches post-split

2009-07-01 Thread Dhruba Borthakur
Hi Todd, Another option (one that is used by Hive) is to have an ant macro that can be overridden from the ant command line. This macro points to the location of the common.jar. By default, it is set to the same value as it is now. If a developer has a common jar that is built in his/her directory

Re: [VOTE] Back-port TFile to Hadoop 0.20

2009-07-07 Thread Dhruba Borthakur
I think we are trying to change an existing Apache-Hadoop process. The current process specifically says that a released branch cannot have new features checked into it. This vote seems to be proposing that "If a new feature does not change any existing code (other than build.xml), then it is ok t

Re: [VOTE] Release Hadoop 0.19.2 (candidate 0)

2009-07-10 Thread Dhruba Borthakur
I have been running 0.19.2 + few patches and they are working well. +1, based on unit tests for 0.19.2. thanks, dhruba On Fri, Jul 10, 2009 at 3:06 PM, Scott Carey wrote: > +1 Looks good to me. Ran through all of our batch jobs successfully.We > have actually been using 0.19.2-dev a couple

Re: Current security implementation in Hadoop

2009-07-22 Thread Dhruba Borthakur
Also, you can look at work in progress: http://issues.apache.org/jira/browse/HADOOP-4487 thanks, dhruba On Wed, Jul 22, 2009 at 9:11 AM, Andrey Pankov wrote: > Maybe this would be interesting for you, > > > http://www.hadoop.iponweb.net/Home/hdfs-over-webdav/authentication-and-permissions > > O

Re: [VOTE] Push back code freeze for 0.21

2009-07-24 Thread Dhruba Borthakur
+1 On 7/24/09, Jim Kellerman (POWERSET) wrote: > +1 > >> -Original Message- >> From: Owen O'Malley [mailto:omal...@apache.org] >> Sent: Friday, July 24, 2009 1:11 PM >> To: common-dev@hadoop.apache.org >> Subject: [VOTE] Push back code freeze for 0.21 >> >> I'd like to push the date f

Re: Remote access to cluster with superuser privileges from untrusted IPs

2009-08-02 Thread Dhruba Borthakur
Hi Pallavi, You are always welcome to post you code as a patch to a JIRA. Even if it does not get committed to the Hadoop code base, you can always refer people to your patch in the JIRA and ask them to use it. thanks, dhruba On Sun, Aug 2, 2009 at 8:54 PM, Palleti, Pallavi < pallavi.pall...@cor

Re: Remote access to cluster with superuser privileges from untrusted IPs

2009-08-02 Thread Dhruba Borthakur
nk regarding any work happening in this regard. I would > be interested in participating/contributing in it. > > Thanks > Pallavi > > -Original Message- > From: Dhruba Borthakur [mailto:dhr...@gmail.com] > Sent: Monday, August 03, 2009 10:59 AM > To: common-dev@

Re: Contributing to HDFS - Distributed Computing

2009-09-01 Thread Dhruba Borthakur
Hi Brian, That is a good idea. Other block placement algorithms to try (using HDFS-385) would be place blocks using heat-map-topology of a data center, or using a dynamic network topology (based on network performance instead of the static network topology that HDFS currently uses), simulate a a n

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-25 Thread Dhruba Borthakur
It is really nice to have wire-compatibility between clients and servers running different versions of hadoop. The reason we would like this is because we can allow the same client (Hive, etc) submit jobs to two different clusters running different versions of hadoop. But I am not stuck up on the n

Re: Towards Hadoop 1.0: Stronger API Compatibility from 0.21 onwards

2009-09-28 Thread Dhruba Borthakur
I think we should not require Job Q compatibility for 1.0 release. thanks, dhruba On Mon, Sep 28, 2009 at 11:06 AM, Sanjay Radia wrote: > > On Sep 28, 2009, at 3:15 AM, Steve Loughran wrote: > > Dhruba Borthakur wrote: >> > It is really nice to have wire-compatibilit

Re: libhdfs with FileSystem cache issue can causes to memory leak ?

2009-10-13 Thread Dhruba Borthakur
There is a python interface to access HDFS files if that helps your case : http://wiki.apache.org/hadoop/HDFS-APIs thanks, dhruba On Tue, Oct 13, 2009 at 12:00 PM, Huy Phan wrote: > Hi Brian, > Thank you for posting your solution here, I will try this on my testing > server and do some load tes

Re: Jira mapred-707

2009-11-02 Thread Dhruba Borthakur
This patch would be helpful. Some general comments on how to contribute are here: http://wiki.apache.org/hadoop/HowToContribute. Your proposal sounds good, let's continue this conversation in the JIRA MAPRED-707. thanks, dhruba On Mon, Nov 2, 2009 at 9:30 AM, alan heirich wrote: > I'm a new con

Re: datanode ack behavior for block receive

2009-11-02 Thread Dhruba Borthakur
Hi Bin, I think that your observation is correct. The act of sending a SUCCESS status ack can be avoided by intelligently looking at the seqno. However, my opinion is that returning the extra bit of information is not impacting performance/correctness at all, do you agree? thank, dhruba On Mo

Re: Private, LimitedPrivate and contrib modules

2009-11-05 Thread Dhruba Borthakur
Hi sanjay, Most of the contrib modules are in the same package as their containers. For example, the fair share scheduler is in contrib but its package name is org.apache.hadoop.mapred. Doesn't this mean that the fair-share scheduler code can use LimitedPrivate methods from org.apache.hadoop.mapre

Moving libhdfs from Mapreduce to Hdfs

2009-11-08 Thread Dhruba Borthakur
I am moving libhdfs from the mapreduce subproject to the hdfs subproject. HDFS-712. thanks, dhruba -- Connect to me at http://www.facebook.com/dhruba

Re: ETL using Hadoop ???

2009-11-12 Thread Dhruba Borthakur
Hi Rajendra, We use Hive for a large data warehouse, details here: http://wiki.apache.org/hadoop/Hive thanks, dhruba On Thu, Nov 12, 2009 at 9:55 AM, Palikala, Rajendra (CCL) < rpalik...@carnival.com> wrote: > Hi All, > > I am an experienced Informatica and Java Developer. I am very new to > H

Re: adding functionality to retrieve file name from a block

2009-12-03 Thread Dhruba Borthakur
I think that if we add the filename of a file to each block, it adds additional complexity to the implementation of the filesystem. what is the use-case that you have in mind? thanks, dhruba On Wed, Dec 2, 2009 at 11:30 PM, Jack Li wrote: > > I am trying to add functionality so that a block can

Re: [DISCUSSION] Release process

2010-04-01 Thread Dhruba Borthakur
We have been testing the HDFS append code for 0.20 (using HDFS-200, HDFS-142), but I believe it is not ready for production yet. I am guessing that there would be another two months of testing before I would classify 0.20.3 + HDFS-200 as production quality. HDFS-200 touches code paths that would ge

Re: Contributing a Parascale fs implementation

2010-04-13 Thread Dhruba Borthakur
Hi Neil, The best way is to create a JIRA and then start a discussion there. https://issues.apache.org/jira/browse/HADOOP thanks, dhruba On Tue, Apr 13, 2010 at 12:42 PM, Neil Bliss wrote: > Hi Folks, > > I'm currently working at Parascale, and I've

Re: Common build broken

2010-11-24 Thread Dhruba Borthakur
Hi Nigel, thanks for taking care of this. I had forgotten to "svn add" one of the new filesI apologize. The rest of the commit looks good. thanks once again, dhruba On Tue, Nov 23, 2010 at 10:21 PM, Nigel Daley wrote: > I committed the missing file and rebuilding now. So far so good. Dh

Re: Review Request: Add copyBytes method to Text and BytesWritable to improve usability

2010-12-16 Thread Dhruba Borthakur
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/182/#review74 --- Ship it! - Dhruba On 2010-12-16 05:35:58, Owen O'Malley wrote: > > --

Re: File access pattern on HDFS?

2011-03-07 Thread Dhruba Borthakur
Here is a JIRA that talks about a file-change-log (but no work has been done yet) http://issues.apache.org/jira/browse/HDFS-1179 thanks, dhruba On Mon, Mar 7, 2011 at 1:24 AM, Harsh J wrote: > There is no such information (history of atime changes, although atime > is held for every file in th

Re: File access pattern on HDFS?

2011-03-07 Thread Dhruba Borthakur
PM, Gautam Singaraju wrote: > HDFS-1179: is exactly what I was looking for. Would it be a good idea to > transmit info over TCP/UDP? > --- > Gautam > > > > On Mon, Mar 7, 2011 at 11:46 AM, Dhruba Borthakur wrote: > >> Here is a JIRA that talks about a file-change-

Re: VOTE: Committing HADOOP-6949 to 0.22 branch

2011-03-28 Thread Dhruba Borthakur
This is a very effective optimization, +1 on pulling it to 0.22. -dhruba On Mon, Mar 28, 2011 at 9:39 PM, Konstantin Shvachko wrote: > HADOOP-6949 introduced a very important optimization to the RPC layer. > Based > on the benchmarks presented in HDFS-1583 this provides an order of > magnitude

Re: [PROPOSAL] Two Jira infrastructure additions to support sustaining bug fixes

2011-09-15 Thread Dhruba Borthakur
+1. I faced the same problems while doing the 0.20-append branch. thanks dhruba On Thu, Sep 15, 2011 at 11:58 AM, Matt Foley wrote: > Hi all, > for better or worse, the Hadoop community works in multiple branches. We > have to do sustaining work on 0.20, even while we hope that 0.23 will > fin

Re: [ANNOUNCE] Intend to build a 0.20.205.1 candidate next Friday 11 Nov.

2011-11-10 Thread Dhruba Borthakur
Hi Eli, There is no new functionality added by HDFS-2246. It is a "performance" fix. But I agree that it is not a trivial fix. One proposal coud be to commit this patch to trunk too and then continue to work on HDFS-347 to make a better fix to this problem. The godo part is that there is no API c

[jira] Created: (HADOOP-6149) FileStatus can support a fileid per path

2009-07-14 Thread dhruba borthakur (JIRA)
FileStatus can support a fileid per path Key: HADOOP-6149 URL: https://issues.apache.org/jira/browse/HADOOP-6149 Project: Hadoop Common Issue Type: New Feature Reporter: dhruba borthakur

[jira] Resolved: (HADOOP-3197) Deadlock in DFCClient

2009-09-30 Thread dhruba borthakur (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur resolved HADOOP-3197. -- Resolution: Cannot Reproduce I am closing this because this is on a very old release

[jira] Created: (HADOOP-6338) Utility to tail the contents of a directory

2009-10-27 Thread dhruba borthakur (JIRA)
borthakur Assignee: dhruba borthakur There is an existing utility "bin/hadoop fs -tail -f " that prints the last few records from the specified file. A map-reduce application uses a directory as a data-set and it creates multiple files in a HDFS directory. I am proposing that

[jira] Created: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block

2009-12-16 Thread dhruba borthakur (JIRA)
Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: dhruba borthakur Assignee: dhruba borthakur The current HDFS implementation has the limitation that it does not replicate the last partial block of a file when it is

[jira] Created: (HADOOP-6713) The RPC server Listener thread is a scalability bottleneck

2010-04-18 Thread dhruba borthakur (JIRA)
Components: ipc Affects Versions: 0.20.2, 0.20.1, 0.20.0 Reporter: dhruba borthakur Assignee: Dmytro Molkov The Hadoop RPC Server implementation has a single Listener thread that reads data from the socket and puts them into a call queue. This means that this

[jira] Created: (HADOOP-6952) Support sending priority RPC

2010-09-14 Thread dhruba borthakur (JIRA)
borthakur Assignee: dhruba borthakur There are certain class of RPCs that need priority delivery. This applies especially to heartbeat RPCs that distributed systems (like HDFS) uses. Ability to deliver heartbeat RPCs earlier than other data-movement RPCs can improve the scalability of large