Please do not confuse old Spark Streaming (DStreams) with Structured Streaming. Structured Streaming's offset and checkpoint management is far more robust than DStreams. Take a look at my talk - https://spark-summit.org/2017/speakers/tathagata-das/
On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Thanks Subhash. > > Have you ever used zero data loss concept with streaming. I am bit worried > to use streamig when it comes to data loss. > > https://blog.cloudera.com/blog/2017/06/offset-management-for-apache-kafka- > with-apache-spark-streaming/ > > > does structured streaming handles it internally? > > On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <subhash.sri...@gmail.com> > wrote: > >> No problem! Take a look at this: >> >> http://spark.apache.org/docs/latest/structured-streaming-pro >> gramming-guide.html#recovering-from-failures-with-checkpointing >> >> Thanks, >> Subhash >> >> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed < >> mdkhajaasm...@gmail.com> wrote: >> >>> Hi Sriram, >>> >>> Thanks. This is what I was looking for. >>> >>> one question, where do we need to specify the checkpoint directory in >>> case of structured streaming? >>> >>> Thanks, >>> Asmath >>> >>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram < >>> subhash.sri...@gmail.com> wrote: >>> >>>> Hi Asmath, >>>> >>>> Here is an example of using structured streaming to read from Kafka: >>>> >>>> https://github.com/apache/spark/blob/master/examples/src/mai >>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa >>>> fkaWordCount.scala >>>> >>>> In terms of parsing the JSON, there is a from_json function that you >>>> can use. The following might help: >>>> >>>> https://databricks.com/blog/2017/02/23/working-complex-data- >>>> formats-structured-streaming-apache-spark-2-1.html >>>> >>>> I hope this helps. >>>> >>>> Thanks, >>>> Subhash >>>> >>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed < >>>> mdkhajaasm...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Could anyone provide suggestions on how to parse json data from kafka >>>>> and load it back in hive. >>>>> >>>>> I have read about structured streaming but didn't find any examples. >>>>> is there any best practise on how to read it and parse it with structured >>>>> streaming for this use case? >>>>> >>>>> Thanks, >>>>> Asmath >>>>> >>>> >>>> >>> >> >