Re: Can we get a spark context inside a mapper

2014-07-15 Thread Rahul Bhojwani
Thanks a lot Sean, Daniel, Matei and Jerry. I really appreciate your reply. And I also understand that I should be a little more patient. When I myself is only not able to reply within next 5 hours how can I expect question to be answered in that time. And yes the Idea of using a separate Clusteri

Re: Can we get a spark context inside a mapper

2014-07-14 Thread Jerry Lam
Hi there, I think the question is interesting; a spark of sparks = spark I wonder if you can use the spark job server ( https://github.com/ooyala/spark-jobserver)? So in the spark task that requires a new spark context, instead of creating it in the task, contact the job server to create one and

Re: Can we get a spark context inside a mapper

2014-07-14 Thread Matei Zaharia
You currently can't use SparkContext inside a Spark task, so in this case you'd have to call some kind of local K-means library. One example you can try to use is Weka (http://www.cs.waikato.ac.nz/ml/weka/). You can then load your text files as an RDD of strings with SparkContext.wholeTextFiles

Re: Can we get a spark context inside a mapper

2014-07-14 Thread Daniel Siegmann
Rahul, I'm not sure what you mean by your question being "very unprofessional". You can feel free to answer such questions here. You may or may not receive an answer, and you shouldn't necessarily expect to have your question answered within five hours. I've never tried to do anything like your ca

Re: Can we get a spark context inside a mapper

2014-07-14 Thread Rahul Bhojwani
I understand that the question is very unprofessional, but I am a newbie. If you could share some link where I can ask such questions, if not here. But please answer. On Mon, Jul 14, 2014 at 6:52 PM, Rahul Bhojwani wrote: > Hey, My question is for this situation: > Suppose we have 10 files

Can we get a spark context inside a mapper

2014-07-14 Thread Rahul Bhojwani
Hey, My question is for this situation: Suppose we have 10 files each containing list of features in each row. Task is that for each file cluster the features in that file and write the corresponding cluster along with it in a new file. So we have to generate 10 more files by applying clu