Please do not confuse old Spark Streaming (DStreams) with Structured
Streaming. Structured Streaming's offset and checkpoint management is far
more robust than DStreams.
Take a look at my talk -
https://spark-summit.org/2017/speakers/tathagata-das/

On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Mohammed <
mdkhajaasm...@gmail.com> wrote:

> Thanks Subhash.
>
> Have you ever used zero data loss concept with streaming. I am bit worried
> to use streamig when it comes to data loss.
>
> https://blog.cloudera.com/blog/2017/06/offset-management-for-apache-kafka-
> with-apache-spark-streaming/
>
>
> does structured streaming handles it internally?
>
> On Wed, Oct 25, 2017 at 3:10 PM, Subhash Sriram <subhash.sri...@gmail.com>
> wrote:
>
>> No problem! Take a look at this:
>>
>> http://spark.apache.org/docs/latest/structured-streaming-pro
>> gramming-guide.html#recovering-from-failures-with-checkpointing
>>
>> Thanks,
>> Subhash
>>
>> On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed <
>> mdkhajaasm...@gmail.com> wrote:
>>
>>> Hi Sriram,
>>>
>>> Thanks. This is what I was looking for.
>>>
>>> one question, where do we need to specify the checkpoint directory in
>>> case of structured streaming?
>>>
>>> Thanks,
>>> Asmath
>>>
>>> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram <
>>> subhash.sri...@gmail.com> wrote:
>>>
>>>> Hi Asmath,
>>>>
>>>> Here is an example of using structured streaming to read from Kafka:
>>>>
>>>> https://github.com/apache/spark/blob/master/examples/src/mai
>>>> n/scala/org/apache/spark/examples/sql/streaming/StructuredKa
>>>> fkaWordCount.scala
>>>>
>>>> In terms of parsing the JSON, there is a from_json function that you
>>>> can use. The following might help:
>>>>
>>>> https://databricks.com/blog/2017/02/23/working-complex-data-
>>>> formats-structured-streaming-apache-spark-2-1.html
>>>>
>>>> I hope this helps.
>>>>
>>>> Thanks,
>>>> Subhash
>>>>
>>>> On Wed, Oct 25, 2017 at 2:59 PM, KhajaAsmath Mohammed <
>>>> mdkhajaasm...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Could anyone provide suggestions on how to parse json data from kafka
>>>>> and load it back in hive.
>>>>>
>>>>> I have read about structured streaming but didn't find any examples.
>>>>> is there any best practise on how to read it and parse it with structured
>>>>> streaming for this use case?
>>>>>
>>>>> Thanks,
>>>>> Asmath
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to