Dear Jacek and Cody,
I receive a stream of JSON (exactly this much: 4 json objects) once every 30
seconds from Kafka as follows (I have changed my data source to include more
fields)
:
{"windspeed":4.23,"pressure":1008.39,"location":"Dundee","latitude":56.5,"longitude":-2.96667,"id":2650752,"h
Hi Jacek and Cody,
First of all, thanks for helping me out.
I started with using combineByKey while testing with just one field. Of course
it worked fine, but I was worried that the code would become unreadable if
there were many fields. Which is why I shifted to sqlContext because the code
is
Rather
val df = sqlContext.read.json(rdd)
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Wed, Jun 15, 2016 at 11:55 PM, Sivakumaran S wrote:
> Cody,
>
> Are you
Hi,
That's one of my concerns with the code. What concerned me the most is
that the RDD(s) were converted to DataFrames only to registerTempTable
and execute SQLs. I think it'd have better performance if DataFrame
operators were used instead. Wish I had numbers.
Pozdrawiam,
Jacek Laskowski
h
Doesn't that result in consuming each RDD twice, in order to infer the
json schema?
On Wed, Jun 15, 2016 at 11:19 AM, Sivakumaran S wrote:
> Of course :)
>
> object sparkStreaming {
> def main(args: Array[String]) {
> StreamingExamples.setStreamingLogLevels() //Set reasonable logging
> leve
Cody,
Are you referring to the val lines = messages.map(_._2)?
Regards,
Siva
> On 15-Jun-2016, at 10:32 PM, Cody Koeninger wrote:
>
> Doesn't that result in consuming each RDD twice, in order to infer the
> json schema?
>
> On Wed, Jun 15, 2016 at 11:19 AM, Sivakumaran S wrote:
>> Of cou
Of course :)
object sparkStreaming {
def main(args: Array[String]) {
StreamingExamples.setStreamingLogLevels() //Set reasonable logging levels
for streaming if the user has not configured log4j.
val topics = "test"
val brokers = "localhost:9092"
val topicsSet = topics.split(",")
Hi,
Good to hear so! Mind sharing a few snippets of your solution?
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Wed, Jun 15, 2016 at 5:03 PM, Sivakumaran S wro
Thanks Jacek,
Job completed!! :) Just used data frames and sql query. Very clean and
functional code.
Siva
> On 15-Jun-2016, at 3:10 PM, Jacek Laskowski wrote:
>
> mapWithState
Hi,
Ad Q1, yes. See stateful operators like mapWithState and windows.
Ad Q2, RDDs should be fine (and available out of the box), but I'd give
Datasets a try too since they're .toDF away.
Jacek
On 14 Jun 2016 10:29 p.m., "Sivakumaran S" wrote:
Dear friends,
I have set up Kafka 0.9.0.0, Spark 1
Dear friends,
I have set up Kafka 0.9.0.0, Spark 1.6.1 and Scala 2.10. My source is sending a
json string periodically to a topic in kafka. I am able to consume this topic
using Spark Streaming and print it. The schema of the source json is as
follows:
{ “id”: 121156, “ht”: 42, “rotor_rpm”: 1
11 matches
Mail list logo