from:"Sunil Parmar"

Re: retention policy for spark structured streaming dataset

2018-03-14 Thread Sunil Parmar

Can you use partitioning ( by day ) ? That will make it easier to drop data older than x days outside streaming job. Sunil Parmar On Wed, Mar 14, 2018 at 11:36 AM, Lian Jiang wrote: > I have a spark structured streaming job which dump data into a parquet > file. To avoid the parque

Re: [Beginner] How to save Kafka Dstream data to parquet ?

2018-03-05 Thread Sunil Parmar

We use Impala to access parquet files in the directories. Any pointers on achieving at least once semantic with spark streaming or partial files ? Sunil Parmar On Fri, Mar 2, 2018 at 2:57 PM, Tathagata Das wrote: > Structured Streaming's file sink solves these problems by writing

Re: [Beginner] How to save Kafka Dstream data to parquet ?

2018-03-02 Thread Sunil Parmar

trying to deal with partial files by writing .tmp files and renaming them as the last step. We only commit offset after rename is successful. This way we get at least once semantic and partial file write issue. Thoughts ? Sunil Parmar On Wed, Feb 28, 2018 at 1:59 PM, Tathagata Das wrote: > The

Re: retention policy for spark structured streaming dataset

Re: [Beginner] How to save Kafka Dstream data to parquet ?

Re: [Beginner] How to save Kafka Dstream data to parquet ?

3 matches

Site Navigation

Mail list logo

Footer information