Hi there,

I think the question is interesting; a spark of sparks = spark
I wonder if you can use the spark job server (
https://github.com/ooyala/spark-jobserver)?

So in the spark task that requires a new spark context, instead of creating
it in the task, contact the job server to create one and use the data in
the task as the data source either via hdfs/tachyon/s3. Wait until the
sub-task is done then continue. Since the job server has the notion of job
id, you might use it as a reference to the sub-task.

I don't know if this is a good idea or bad one. Maybe this is an
anti-pattern of spark, but maybe not.

HTH,

Jerry



On Mon, Jul 14, 2014 at 3:09 PM, Matei Zaharia <matei.zaha...@gmail.com>
wrote:

> You currently can't use SparkContext inside a Spark task, so in this case
> you'd have to call some kind of local K-means library. One example you can
> try to use is Weka (http://www.cs.waikato.ac.nz/ml/weka/). You can then
> load your text files as an RDD of strings with SparkContext.wholeTextFiles
> and call Weka on each one.
>
> Matei
>
> On Jul 14, 2014, at 11:30 AM, Rahul Bhojwani <rahulbhojwani2...@gmail.com>
> wrote:
>
> I understand that the question is very unprofessional, but I am a newbie.
> If you could share some link where I can ask such questions, if not here.
>
> But please answer.
>
>
> On Mon, Jul 14, 2014 at 6:52 PM, Rahul Bhojwani <
> rahulbhojwani2...@gmail.com> wrote:
>
>> Hey, My question is for this situation:
>> Suppose we have 100000 files each containing list of features in each row.
>>
>> Task is that for each file cluster the features in that file and write
>> the corresponding cluster along with it in a new file.  So we have to
>> generate 100000 more files by applying clustering in each file
>> individually.
>>
>> So can I do it this way, that get rdd of list of files and apply map.
>> Inside the mapper function which will be handling each file, get another
>> spark context and use Mllib kmeans to get the clustered output file.
>>
>> Please suggest the appropriate method to tackle this problem.
>>
>> Thanks,
>> Rahul Kumar Bhojwani
>> 3rd year, B.Tech
>> Computer Science Engineering
>> National Institute Of Technology, Karnataka
>> 9945197359
>>
>
>
>
> --
> Rahul K Bhojwani
> 3rd Year B.Tech
> Computer Science and Engineering
> National Institute of Technology, Karnataka
>
>
>

Reply via email to