If you want a single-machine 'cluster' to try all of these things, you don't strictly need a distribution, but, it will probably save you a great deal of time and trouble compared to setting all of this up by hand.
Naturally I would promote CDH, as it contains Spark and Mahout and supports them all, but you can find other distributions that you can get working too. I don't think this changes the issue of running a VM or not, and 4GB is small to run all of the processes of a Hadoop cluster and still have room to get work done. This won't change because you set it up by hand, although, I find using a distribution lets you easily turn off services you don't want and turn down memory settings for example. You do not have to consume a 1-machine cluster as a VM image. (Note, you can run R or Play inside the VM or other instance you create.) For example, in the case of CDH, Cloudera Manager is also the installer and can set up a cluster on any machine you like. Or, you can connect to the instance you create as if it's a remote machine and access the data from R for example. Also consider running an instance in the cloud on Amazon EC2 or GCE, which you can pause and restart when you want to play with it. In the case of Spark, you don't strictly need Hadoop at all. It's easy to play around locally on the local file system. On Tue, Sep 30, 2014 at 5:32 PM, mohan <[email protected]> wrote: > Sorry to ask another basic question. > > Could you point out what I should read to setup a pseudo-distributed > Hadoop,Mahout and Spark cluster ? Does it really need something like CDH ? > > I want to access Mahout and Spark output and display in Play(outside CDH). I > also want to access Spark output from R. The VM may hinder it. > > I have a 4 GB Mac and want to avoid another VM if I can. > > Thanks, > Mohan > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Installation-question-tp15412.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
