No, Steve, I meant exactly what I wrote. One day, when you are here, Let's meet 
in the same chinese restaurant, and I will give you the numbers on performance 
and cost, and let us do the division of these two numbers.

Let's talk about how the nn latency becomes a bottleneck for rest of the 
cluster's throughput, and why the networking world's advances cannot be pushed 
under the rug.

Let's talk about why your employers are cozying up with DSSD and engenio while 
you and others in open source are insisting on 1GbE and DAS SATA disks being 
the most suitable  for Hadoop.

And most of all, lets chat about why business aspects of Hadoop are acting 
against the open source from the same orgs' folks.

Tomorrow and day after we are conducting big data benchmarking workshop in San 
Jose, where your partners and other open-core hadoop company's partners will 
demonstrate how advanced hardware (cpus, networks, storage) is more cost 
effective that what you are recommending.

I had recognized this phenomenon very early, and wrote a blog post comparing 
open source hadoop development to charlie chaplin, who missed the color and 
talky movie technology, by sticking to mute black and white technology. I know 
your employer has moved beyond cheap hardware, based on what I hear from 
customers where we compete. I an wondering why you still keep on insisting new 
technologies are not worth it.

- milind

Sent from my iPhone

> On Oct 9, 2013, at 1:45, Steve Loughran <ste...@hortonworks.com> wrote:
> 
> On 9 October 2013 01:57, Milind Bhandarkar <mbhandar...@gopivotal.com>wrote:
> 
>> Yes, we have. It works very well, but it is considered too niche by folks
>> who insist on buying the least capable hardware for their test clusters,
>> and therefore, recommend such underpowered clusters to customers as well.
> 
> 
> surely you meant to say  "take advantage of the cost model of JBOD storage
> and ethernet to allow data to be stored and accessed at significantly lower
> price points than for legacy storage architectures and pricing models -so
> enabling their customers to store and process data they would have
> previously had to discard" (0)
> 
> IB should be most interesting at the app level -for apps > classic MR.
> That's giraph, streaming work, Tez. I'd like to see some numbers there. As
> the oracle
> 
> For storage, IB would make locality less of an issue (1,2), and instead
> make the level of storage: SSD vs HDD more significant in terms of
> performance (2). There is ongoing work there  in a set of JIRAs about
> multi-tier storage.
> 
> I don't know the current state of Hadoop on IB, or even if allocateDirect()
> of NIO has been picked up. For IPC there should be some latency
> improvements, while for the Datanodes its the bulk data you want to push
> around faster. If you want to work on either of those problems you'd be
> very welcome.
> 
> 
> 
> (0) I also have a VMWare test cluster for some HA work and VM capacity from
> Rackspace for a broader pool of deployment options.
> (1) Hadoop 2.1 supports Unix Domain Sockets for a direct-yet-secure
> connection from a local app (HBase, ...) and the Datanode. This bypasses
> the network stack entirely
> (2)
> http://nowlab.cse.ohio-state.edu/publications/conf-papers/2010/sur-masvdc10.pdf
> (3) http://www.cs.berkeley.edu/%7Eganesha/disk-irrelevant_hotos2011.pdf
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.

Reply via email to