date:20181117

[Spark Structued Streaming]: Read kafka offset from a timestamp

2018-11-17 Thread puneetloya

I would like to request a feature for reading data from Kafka Source based on a timestamp. So that if the application needs to process data from a certain time, it should be able to do it. I do agree, that there is checkpoint which gives us a continuation of stream process but what if I want to rew

Delta Logic in Spark

2018-11-17 Thread Mahender Sarangam

Hi, We have daily data pull which pulls almost 50 GB of data from upstream system. We are using Spark SQL for processing of 50 GB. Finally insert 50 GB of data into Hive Target table and Now we are copying whole hive target table to SQL esp. SQL Staging Table & implement merge from staging

Re: Equivalent of emptyDataFrame in StructuredStreaming

2018-11-17 Thread Arun Manivannan

Hi Jungtaek, Sorry about the delay in my response and thanks a ton for responding. I am just trying to build a data pipeline which has a bunch of stages. The goal is to use a Dataset to accumulate the transformation errors that may happen in the stages of the pipeline. As a benefit, I can pass o