Re: Spark replacing Hadoop

Sean Owen Thu, 14 Apr 2016 13:04:22 -0700

Depends indeed on what you mean by "Hadoop". The core Hadoop project
is MapReduce, YARN and HDFS. MapReduce is still in use as a workhorse
but superseded by engines like Spark (or perhaps Flink).  (Tez maps
loosely to Spark Core really, and is not really a MapReduce
replacement.)

"Hadoop" can also be a catch-all term for projects typically used
together in conjunction with core Hadoop. That can be Spark, Kafka,
HBase, ZK, Solr, Parquet, Impala, Hive, etc.

If you mean the former -- mostly no, Spark needs a storage layer like
HDFS for persistent storage, and needs to integrate with a cluster
manager like YARN in order to share resources with other apps, but
replaces MapReduce.

If you mean the latter -- no, Spark is a big piece of the broader
picture and replaces several pieces (Mahout, maybe Crunch in some
ways, Giraph, arguably takes on some of Hive's workloads), but doesn't
replace most of them.

Really, there's no reason to expect that one project will do
everything. Core Hadoop mostly certainly wasn't enough to handle all
the "Hadoop" workloads today. It's a false choice. You can use Spark
*and* Hadoop-related projects and that's the best of all.

On Thu, Apr 14, 2016 at 8:40 PM, Mich Talebzadeh
<[email protected]> wrote:
> Hi,
>
> My two cents here.
>
> Hadoop as I understand has two components namely HDFS (Hadoop Distributed
> File System) and MapReduce.
>
> Whatever we use I still think we need to store data on HDFS (excluding
> standalones like MongoDB etc.). Now moving to MapReduce as the execution
> engine that is replaced by TEZ (basically MapReduce with DAG) or with Spark
> which uses in memory capabilities and DAG. MapReduce is the one moving
> sideways.
>
> To me Spark besides being versatile is a powerful tool. Remember tools are
> just tools, not solutions so we can discuss this all day. Effectively I
> would argue that with Spark as the front end tool with Hive and its
> organisation for metadata plus HDFS as the storage layer, you have all three
> components to create a powerful solution.
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
>
> On 14 April 2016 at 20:22, Andy Davidson <[email protected]>
> wrote:
>>
>> Hi Ashok
>>
>> In general if I was starting a new project and had not invested heavily in
>> hadoop (i.e. Had a large staff that was trained on hadoop, had a lot of
>> existing projects implemented on hadoop, …) I would probably start using
>> spark. Its faster and easier to use
>>
>> Your mileage may vary
>>
>> Andy
>>
>> From: Ashok Kumar <[email protected]>
>> Reply-To: Ashok Kumar <[email protected]>
>> Date: Thursday, April 14, 2016 at 12:13 PM
>> To: "user @spark" <[email protected]>
>> Subject: Spark replacing Hadoop
>>
>> Hi,
>>
>> I hear that some saying that Hadoop is getting old and out of date and
>> will be replaced by Spark!
>>
>> Does this make sense and if so how accurate is it?
>>
>> Best
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Spark replacing Hadoop

Reply via email to