2015-02-06 17:28 GMT+00:00 King sami <[email protected]>:
> The purpose is to build a data processing system for door events. An event > will describe a door unlocking > with a badge system. This event will differentiate unlocking by somebody > from the inside and by somebody > from the outside. > > *Producing the events*: > You will need a simulator capable of producing events at random intervals. > Simulating 200 doors seems like > a good number, but adapt it as you see fit to get relevant results. Make > sure different doors have different > patterns to make the analysis interesting. > > *Processing the events:* > After having accumulated a certain amount of events (for example: a day), > you will calculate statistics. To do > this, you will use spark for your batch processing. You will extract: > • most used door, less used door, door with most exits, door with most > entrances > • most and less busy moment (when people entered and exited a lot, or not > at all) > • less busy moment of the day > > *Hints:* > • Spark is required: http://spark.apache.org > • Coding in Scala is required. > • Using HDFS for file storage is a plus. > > 2015-02-06 17:00 GMT+00:00 Nagesh sarvepalli <[email protected]> > : > >> Hi, >> >> Here is the sequence I suggest. Feel free if you need further help. >> >> 1) You need to decide if you want to go with any particular distribution >> of Hadoop (Cloudera / Hortonworks / MapR) or want to go for apache version >> . Downloading Hadoop from Apache and integrating with various projects is >> laborious (compared to distributions). Also, you need to take care of >> maintenance including version compatibility of various projects. Cloudera >> Manager is the best when it comes to cluster installation and maintenance >> but it is memory intensive. Cloud offerings (ex: from Microsoft) are even >> much more simpler and hassle free when it comes to installation and >> maintenance. >> >> 2) Depending on the server resources and the data size, you need to >> decide on the HDFS cluster size (number of nodes). Ensure you have the >> right JDK version installed if you are installing Hadoop on your own. >> >> 3) Once Hadoop is installed, you need to download Scala from >> scala-lang.org and then >> >> 4) Download and install spark from http://spark.apache.org/downloads.html >> >> Hope this helps to kick-start. >> >> Thanks & Regards >> Nagesh >> Cloudera Certified Hadoop Developer >> >> On Fri, Feb 6, 2015 at 4:09 PM, King sami <[email protected]> wrote: >> >>> Hi, >>> >>> I'm new in Spark, I'd like to install Spark with Scala. The aim is to >>> build a data processing system foor door events. >>> >>> the first step is install spark, scala, hdfs and other required tools. >>> the second is build the algorithm programm in Scala which can treat a >>> file of my data logs (events). >>> >>> Could you please help me to install the required tools: Spark, Scala, >>> HDF and tell me how can I execute my programm treating the entry file. >>> >>> >>> Best regards, >>> >>> >>> >>> >>> >> >
