Actually, this one caught my eye when I originally read it:
4*) decompression to a different native buffer (not really a copy -
decompression necessarily rewrites)
Actually, LZO can be done in-place (an awfully neat trick!). It's a
micro-optimization, but possibly could save some buffer space
I can provide another data point here: xfs works very well in modern Linuxes
(in the 2.6.9 era, it had many memory management headaches, especially around
the switch to 4k stacks), and its advantage is significant when you run file
systems over 95% occupied.
Brian
On Oct 10, 2011, at 8:51 AM,
On Sep 13, 2011, at 7:20 AM, Steve Loughran wrote:
>
> I missed a talk at the local university by a Platform sales rep last month,
> though I did get to offend one of the authors of condor team instead [1]. by
> pointing out that all grid schedulers contain a major assumption: that
> storage
Hi,
Well, I feel compelled to ask: what are folks opinions about the IPv6-readiness
of Hadoop?
Doing some searching around, I think it's stated up-front that Hadoop is not
IPv6 ready, and there's little interest in IPv6 support as Hadoop is meant to
be within the data center (which I suppose s
On Feb 18, 2011, at 12:32 PM, Rajat Sharma wrote:
> Hi Guys
>
> I am trying to work on HDFS to improve its performance by adding RDMA
> functionality to its code. But i cannot find any sort of documentation or
> help regarding this topic except some information about Socket Direct
> Protocol or
On Jan 5, 2011, at 4:03 PM, Milind Bhandarkar wrote:
> I agree with Jay B. Checksumming is usually the culprit for high CPU on
> clients and datanodes. Plus, a checksum of 4 bytes for every 512, means for
> 64MB block, the checksum will be 512KB, i.e. 128 ext3 blocks. Changing it to
> generate
On Jan 3, 2011, at 8:47 PM, Christopher Smith wrote:
> On Mon, Jan 3, 2011 at 5:05 PM, Brian Bockelman wrote:
>
>> On Jan 3, 2011, at 5:17 PM, Christopher Smith wrote:
>>> On Mon, Jan 3, 2011 at 11:40 AM, Brian Bockelman >> wrote:
>>>
>>>>
On Jan 3, 2011, at 5:17 PM, Christopher Smith wrote:
> On Mon, Jan 3, 2011 at 11:40 AM, Brian Bockelman wrote:
>
>> It's not immediately clear to me the size of the benefit versus the costs.
>> Two cases where one normally thinks about direct I/O are:
>> 1) The u
Hi Da,
It's not immediately clear to me the size of the benefit versus the costs. Two
cases where one normally thinks about direct I/O are:
1) The usage scenario is a cache anti-pattern. This will be true for some
Hadoop use cases (MapReduce), not true for some others.
- http://www.jeffshafe
Interesting! Here's what the Condor folks have been doing with MapReduce:
http://www.cs.wisc.edu/condor/CondorWeek2010/condor-presentations/thain-condor-hadoop.pdf
Dunno why we don't see more of them (maybe it's just because I'm not subscribed
to the MAPREDUCE mailing list? I have too many ema
On Apr 28, 2010, at 5:04 AM, Steve Loughran wrote:
> prajyot bankade wrote:
>> Hello Everyone,
>> I have just started reading about hadoop job tracker. In one book I read
>> that there is only one job tracker who is responsible to distribute task to
>> worker system. Please make me right if i say
Hey Allen,
Your post provoked a few thoughts:
1) Hadoop is a large, but relatively immature project (as in, there's still a
lot of major features coming down the pipe). If we wait to release on
features, especially when there are critical bugs, we end up with a large
number of patches between
Hey Eli,
From past experience, static, manual namespace partitioning can really get you
in trouble - you have to manually keep things balanced.
The following things can go wrong:
1) One of your pesky users grows unexpectedly by a factor of 10.
2) Your entire system grew so much that there's no
Hey Arya,
Try running Ganglia on your cluster:
http://ganglia.sourceforge.net/
One of the statistics it collects is the network I/O rate.
Also - for the future, you probably want to use the
common-u...@hadoop.apache.org mailing list for these sort of general questions.
Brian
On Jan 28, 2010,
Hey Huy,
Heres what we do:
1) include hdfsJniHelper.h
2) Do the following when you're done with the filesystem:
if (NULL != fs) {
//Get the JNIEnv* corresponding to current thread
JNIEnv* env = getJNIEnv();
if (env == NULL) {
ret = -EIO;
} else {
//
Hey all,
One place which would be an exceptionally good research project is the
new pluggable interface for replica placement.
https://issues.apache.org/jira/browse/HDFS-385
It's something which taps into many lines of CS research (such as
scheduling) and is meant to be experimental for a
libhdfs leaks object references
---
Key: HADOOP-6207
URL: https://issues.apache.org/jira/browse/HADOOP-6207
Project: Hadoop Common
Issue Type: Bug
Components: fs
Reporter: Brian Bockelman
17 matches
Mail list logo