Re: Spark replacing Hadoop

Mich Talebzadeh Thu, 14 Apr 2016 14:19:51 -0700

One can see from the responses that Big Data landscape is getting very
crowded with tools and there are dozens of alternatives offered. However,
as usual the laws of selection will gravitate towards solutions that are
scalable, reliable and more importantly cost effective.


To this end any commercial decision to acquire solutions as a technology
stack has to take into account the available skill sets in-house and the
stability of the products. I would concur with those that agree that a
smart solution will always require a good query engine, a mechanism to
organise the storage and the storage layer itself plus the resource
manager. The rests are icing on the cake.

To me Spark with Hive, HDFS and Yarn are winning combinations. Hadoop
encompasses HDFS and it is almost impossible to side step it without
finding a viable alternative as persistent storage. Also I take the point
that with already made investments in Hadoop, the exist barriers won't make
commercial sense. In other words one needs compelling arguments (besides
purely technical outlook) to replace Hadoop in this Financial climate that
technology dollars are a premium.

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 14 April 2016 at 21:35, Peyman Mohajerian <[email protected]> wrote:

> Cloud adds another dimension:
> The fact that in cloud compute and storage is decoupled, s3-emr or
> blob-hdisight, means in cloud Hadoop ends up being more of a compute engine
> and a lot of the governance, security features are irrelevant or less
> important because data at rest is out of Hadoop.
> Currently the biggest reason to run Spark in Hadoop is Yarn (in cloud),
>  but if you decide to use Mesos/Standalone then again you may not need
> Hadoop. Databrick adds another dimension to this in cloud which I won't
> comment on.
>
> But on-premise I think you can argue that HDFS is here to stay in many
> forms, e.g. Isilon, object stores and other storage types not just local
> disk. HDFS API actually works over Azure's Data Lake Store completely
> independent of Hadoop!
>
> On Thu, Apr 14, 2016 at 1:29 PM, Cody Koeninger <[email protected]>
> wrote:
>
>> I've been using spark for years and have (thankfully) been able to
>> avoid needing HDFS, aside from one contract where it was already in
>> use.
>>
>> At this point, many of the people I know would consider Kafka to be
>> more important than HDFS.
>>
>> On Thu, Apr 14, 2016 at 3:11 PM, Jörn Franke <[email protected]>
>> wrote:
>> > I do not think so. Hadoop provides an ecosystem in which you can deploy
>> > different engines, such as MR, HBase, TEZ, Spark, Flink, titandb, hive,
>> > solr... I observe also that commercial analytical tools use one or more
>> of
>> > these engines to execute their code in a distributed fashion. You  need
>> this
>> > flexibility to have an ecosystem suitable for your needs -especially In
>> the
>> > area of security. HDFS is one key element for the storage and locality.
>> > Spark itself cannot provide such a complete ecosystem but is part of
>> > ecosystems.
>> >
>> > On 14 Apr 2016, at 21:13, Ashok Kumar <[email protected]>
>> wrote:
>> >
>> > Hi,
>> >
>> > I hear that some saying that Hadoop is getting old and out of date and
>> will
>> > be replaced by Spark!
>> >
>> > Does this make sense and if so how accurate is it?
>> >
>> > Best
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>

Re: Spark replacing Hadoop

Reply via email to