Re: Spark and N-tier architecture

Ashok Kumar Tue, 29 Mar 2016 16:02:31 -0700

Thank you both.
So am I correct that Spark fits in within the application tier in N-tier 
architecture?


    On Tuesday, 29 March 2016, 23:50, Alexander Pivovarov 
<apivova...@gmail.com> wrote:
 

 Spark is a distributed data processing engine plus distributed in-memory / 
disk data cache 
spark-jobserver provides REST API to your spark applications. It allows you to 
submit jobs to spark and get results in sync or async mode
It also can create long running Spark context to cache RDDs in memory with some 
name (namedRDD) and then use it to serve requests from multiple users. Because 
RDD is in memory response should be super fast (seconds)
https://github.com/spark-jobserver/spark-jobserver


On Tue, Mar 29, 2016 at 2:50 PM, Mich Talebzadeh <mich.talebza...@gmail.com> 
wrote:

Interesting question.
The most widely used application of N-tier is the traditional three-tier 
architecture that has been the backbone of Client-server architecture by having 
presentation layer, application layer and data layer. This is primarily for 
performance, scalability and maintenance. The most profound changes that Big 
data space has introduced to N-tier architecture is the concept of horizontal 
scaling as opposed to the previous tiers that relied on vertical scaling. HDFS 
is an example of horizontal scaling at the data tier by adding more JBODS to 
storage. Similarly adding more nodes to Spark cluster should result in better 
performance. 
Bear in mind that these tiers are at Logical levels which means that there or 
may not be so many so many physical layers. For example multiple virtual 
servers can be hosted on the same physical server.
With regard to Spark, it is effectively a powerful query tools that sits in 
between the presentation layer (say Tableau) and the HDFS or Hive as you 
alluded. In that sense you can think of Spark as part of the application layer 
that communicates with the backend via a number of protocols including the 
standard JDBC. There is rather a blurred vision here whether Spark is a 
database or query tool. IMO it is a query tool in a sense that Spark by itself 
does not have its own storage concept or metastore. Thus it relies on others to 
provide that service.
HTH


Dr Mich Talebzadeh LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 http://talebzadehmich.wordpress.com 
On 29 March 2016 at 22:07, Ashok Kumar <ashok34...@yahoo.com.invalid> wrote:

Experts,
One of terms used and I hear is N-tier architecture within Big Data used for 
availability, performance etc. I also hear that Spark by means of its query 
engine and in-memory caching fits into middle tier (application layer) with 
HDFS and Hive may be providing the data tier.  Can someone elaborate the role 
of Spark here. For example A Scala program that we write uses JDBC to talk to 
databases so in that sense is Spark a middle tier application?
I hope that someone can clarify this and if so what would the best practice in 
using Spark as middle tier and within Big data.
Thanks

Re: Spark and N-tier architecture

Reply via email to