Hello -
I am facing some issues with the following snippet of code that reads from
Kafka and creates DStream. I am using KafkaUtils.createDirectStream(..) with
Kafka 0.10.1 and Spark 2.0.1.
// get the data from kafka
val stream: DStream[ConsumerRecord[Array[Byte], (String, String)]] =
KafkaUti
Hello -
I have a question on parallelization of model training in Spark ..
Suppose I have this code fragment for training a model with KMeans ..
labeledData.foreachRDD { rdd =>
val normalizedData: RDD[Vector] = normalize(rdd)
val trainedModel: KMeansModel = trainModel(normalizedData, noOfClu
Hello -
I am trying to implement an outlier detection application on streaming data.
I am a newbie to Spark and hence would like some advice on the confusions
that I have ..
I am thinking of using StreamingKMeans - is this a good choice ? I have one
stream of data and I need an online algorithm.
Hello -
I have a question on how to handle incremental model updation in Spark ML ..
We have a time series where we predict the future conditioned on the past.
We can train a model offline based on historical data and then use that
model during prediction.
But say, if the underlying process is