Re: Beginner in Spark

King sami Tue, 10 Feb 2015 14:43:00 -0800

2015-02-06 17:28 GMT+00:00 King sami <[email protected]>:


> The purpose is to build a data processing system for door events. An event
> will describe a door unlocking
> with a badge system. This event will differentiate unlocking by somebody
> from the inside and by somebody
> from the outside.
>
> *Producing the events*:
> You will need a simulator capable of producing events at random intervals.
> Simulating 200 doors seems like
> a good number, but adapt it as you see fit to get relevant results. Make
> sure different doors have different
> patterns to make the analysis interesting.
>
> *Processing the events:*
> After having accumulated a certain amount of events (for example: a day),
> you will calculate statistics. To do
> this, you will use spark for your batch processing. You will extract:
> • most used door, less used door, door with most exits, door with most
> entrances
> • most and less busy moment (when people entered and exited a lot, or not
> at all)
> • less busy moment of the day
>
> *Hints:*
> • Spark is required: http://spark.apache.org
> • Coding in Scala is required.
> • Using HDFS for file storage is a plus.
>
> 2015-02-06 17:00 GMT+00:00 Nagesh sarvepalli <[email protected]>
> :
>
>> Hi,
>>
>> Here is the sequence I suggest. Feel free if you need further help.
>>
>> 1) You need to decide if you want to go with any particular distribution
>> of Hadoop (Cloudera / Hortonworks / MapR) or want to go for apache version
>> . Downloading Hadoop from Apache and integrating with various projects is
>> laborious (compared to distributions).  Also, you need to take care of
>> maintenance including version compatibility of various projects. Cloudera
>> Manager is the best when it comes to cluster installation and maintenance
>> but it is memory intensive. Cloud offerings (ex: from Microsoft) are even
>> much more simpler and hassle free when it comes to installation and
>> maintenance.
>>
>> 2) Depending on the server resources and the data size, you need to
>> decide on the HDFS cluster size (number of nodes). Ensure you have  the
>> right JDK version installed if you are installing Hadoop on your own.
>>
>> 3) Once Hadoop is installed, you need to download Scala from
>> scala-lang.org and then
>>
>> 4) Download and install spark from http://spark.apache.org/downloads.html
>>
>> Hope this helps to kick-start.
>>
>> Thanks & Regards
>> Nagesh
>> Cloudera Certified Hadoop Developer
>>
>> On Fri, Feb 6, 2015 at 4:09 PM, King sami <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I'm new in Spark, I'd like to install Spark with Scala. The aim is to
>>> build a data processing system foor door events.
>>>
>>> the first step is install spark, scala, hdfs and other required tools.
>>> the second is build the algorithm programm in Scala which can treat a
>>> file of my data logs (events).
>>>
>>> Could you please help me to install the required tools: Spark, Scala,
>>> HDF and tell me how can I execute my programm treating the entry file.
>>>
>>>
>>> Best regards,
>>>
>>>
>>>
>>>
>>>
>>
>

Re: Beginner in Spark

Reply via email to