problem with kafka createDirectStream ..

2016-12-09 Thread debasishg
Hello - I am facing some issues with the following snippet of code that reads from Kafka and creates DStream. I am using KafkaUtils.createDirectStream(..) with Kafka 0.10.1 and Spark 2.0.1. // get the data from kafka val stream: DStream[ConsumerRecord[Array[Byte], (String, String)]] = KafkaUti

parallelizing model training ..

2016-11-22 Thread debasishg
Hello - I have a question on parallelization of model training in Spark .. Suppose I have this code fragment for training a model with KMeans .. labeledData.foreachRDD { rdd => val normalizedData: RDD[Vector] = normalize(rdd) val trainedModel: KMeansModel = trainModel(normalizedData, noOfClu

using StreamingKMeans

2016-11-19 Thread debasishg
Hello - I am trying to implement an outlier detection application on streaming data. I am a newbie to Spark and hence would like some advice on the confusions that I have .. I am thinking of using StreamingKMeans - is this a good choice ? I have one stream of data and I need an online algorithm.

Incremental model update

2016-09-27 Thread debasishg
Hello - I have a question on how to handle incremental model updation in Spark ML .. We have a time series where we predict the future conditioned on the past. We can train a model offline based on historical data and then use that model during prediction. But say, if the underlying process is