Re: Compressor tweaks corresponding to HDFS-2834, 3051?

2012-03-07 Thread Brian Bockelman
Actually, this one caught my eye when I originally read it: 4*) decompression to a different native buffer (not really a copy - decompression necessarily rewrites) Actually, LZO can be done in-place (an awfully neat trick!). It's a micro-optimization, but possibly could save some buffer space

Re: making file system block size bigger to improve hdfs performance ?

2011-10-10 Thread Brian Bockelman
I can provide another data point here: xfs works very well in modern Linuxes (in the 2.6.9 era, it had many memory management headaches, especially around the switch to 4k stacks), and its advantage is significant when you run file systems over 95% occupied. Brian On Oct 10, 2011, at 8:51 AM,

Re: Platform MapReduce - Enterprise Features

2011-09-13 Thread Brian Bockelman
On Sep 13, 2011, at 7:20 AM, Steve Loughran wrote: > > I missed a talk at the local university by a Platform sales rep last month, > though I did get to offend one of the authors of condor team instead [1]. by > pointing out that all grid schedulers contain a major assumption: that > storage

Happy World IPv6 Day!

2011-06-08 Thread Brian Bockelman
Hi, Well, I feel compelled to ask: what are folks opinions about the IPv6-readiness of Hadoop? Doing some searching around, I think it's stated up-front that Hadoop is not IPv6 ready, and there's little interest in IPv6 support as Hadoop is meant to be within the data center (which I suppose s

Re: Hdfs over RDMA

2011-02-22 Thread Brian Bockelman
On Feb 18, 2011, at 12:32 PM, Rajat Sharma wrote: > Hi Guys > > I am trying to work on HDFS to improve its performance by adding RDMA > functionality to its code. But i cannot find any sort of documentation or > help regarding this topic except some information about Socket Direct > Protocol or

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Brian Bockelman
On Jan 5, 2011, at 4:03 PM, Milind Bhandarkar wrote: > I agree with Jay B. Checksumming is usually the culprit for high CPU on > clients and datanodes. Plus, a checksum of 4 bytes for every 512, means for > 64MB block, the checksum will be 512KB, i.e. 128 ext3 blocks. Changing it to > generate

Re: Hadoop use direct I/O in Linux?

2011-01-03 Thread Brian Bockelman
On Jan 3, 2011, at 8:47 PM, Christopher Smith wrote: > On Mon, Jan 3, 2011 at 5:05 PM, Brian Bockelman wrote: > >> On Jan 3, 2011, at 5:17 PM, Christopher Smith wrote: >>> On Mon, Jan 3, 2011 at 11:40 AM, Brian Bockelman >> wrote: >>> >>>>

Re: Hadoop use direct I/O in Linux?

2011-01-03 Thread Brian Bockelman
On Jan 3, 2011, at 5:17 PM, Christopher Smith wrote: > On Mon, Jan 3, 2011 at 11:40 AM, Brian Bockelman wrote: > >> It's not immediately clear to me the size of the benefit versus the costs. >> Two cases where one normally thinks about direct I/O are: >> 1) The u

Re: Hadoop use direct I/O in Linux?

2011-01-03 Thread Brian Bockelman
Hi Da, It's not immediately clear to me the size of the benefit versus the costs. Two cases where one normally thinks about direct I/O are: 1) The usage scenario is a cache anti-pattern. This will be true for some Hadoop use cases (MapReduce), not true for some others. - http://www.jeffshafe

Re: Regarding Job tracker

2010-04-28 Thread Brian Bockelman
Interesting! Here's what the Condor folks have been doing with MapReduce: http://www.cs.wisc.edu/condor/CondorWeek2010/condor-presentations/thain-condor-hadoop.pdf Dunno why we don't see more of them (maybe it's just because I'm not subscribed to the MAPREDUCE mailing list? I have too many ema

Re: Regarding Job tracker

2010-04-28 Thread Brian Bockelman
On Apr 28, 2010, at 5:04 AM, Steve Loughran wrote: > prajyot bankade wrote: >> Hello Everyone, >> I have just started reading about hadoop job tracker. In one book I read >> that there is only one job tracker who is responsible to distribute task to >> worker system. Please make me right if i say

Re: [DISCUSSION] Release process

2010-03-24 Thread Brian Bockelman
Hey Allen, Your post provoked a few thoughts: 1) Hadoop is a large, but relatively immature project (as in, there's still a lot of major features coming down the pipe). If we wait to release on features, especially when there are critical bugs, we end up with a large number of patches between

Re: Namespace partitioning using Locality Sensitive Hashing

2010-03-01 Thread Brian Bockelman
Hey Eli, From past experience, static, manual namespace partitioning can really get you in trouble - you have to manually keep things balanced. The following things can go wrong: 1) One of your pesky users grows unexpectedly by a factor of 10. 2) Your entire system grew so much that there's no

Re: How to obtain dynamic bandwidth

2010-01-29 Thread Brian Bockelman
Hey Arya, Try running Ganglia on your cluster: http://ganglia.sourceforge.net/ One of the statistics it collects is the network I/O rate. Also - for the future, you probably want to use the common-u...@hadoop.apache.org mailing list for these sort of general questions. Brian On Jan 28, 2010,

Re: libhdfs with FileSystem cache issue can causes to memory leak ?

2009-10-13 Thread Brian Bockelman
Hey Huy, Heres what we do: 1) include hdfsJniHelper.h 2) Do the following when you're done with the filesystem: if (NULL != fs) { //Get the JNIEnv* corresponding to current thread JNIEnv* env = getJNIEnv(); if (env == NULL) { ret = -EIO; } else { //

Re: Contributing to HDFS - Distributed Computing

2009-09-01 Thread Brian Bockelman
Hey all, One place which would be an exceptionally good research project is the new pluggable interface for replica placement. https://issues.apache.org/jira/browse/HDFS-385 It's something which taps into many lines of CS research (such as scheduling) and is meant to be experimental for a

[jira] Created: (HADOOP-6207) libhdfs leaks object references

2009-08-21 Thread Brian Bockelman (JIRA)
libhdfs leaks object references --- Key: HADOOP-6207 URL: https://issues.apache.org/jira/browse/HADOOP-6207 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Brian Bockelman