I would like to request a feature for reading data from Kafka Source based on
a timestamp. So that if the application needs to process data from a certain
time, it should be able to do it. I do agree, that there is checkpoint which
gives us a continuation of stream process but what if I want to rew
Hi,
We have daily data pull which pulls almost 50 GB of data from upstream system.
We are using Spark SQL for processing of 50 GB. Finally insert 50 GB of data
into Hive Target table and Now we are copying whole hive target table to SQL
esp. SQL Staging Table & implement merge from staging
Hi Jungtaek,
Sorry about the delay in my response and thanks a ton for responding.
I am just trying to build a data pipeline which has a bunch of stages. The
goal is to use a Dataset to accumulate the transformation errors that may
happen in the stages of the pipeline. As a benefit, I can pass o