Thanks. Do we have option to limit number of records ? Like process only 10000 or the property we pass ? This way we can handle the amount of the data for batches that we need .
Sent from my iPhone > On Oct 21, 2020, at 12:11 AM, lec ssmi <[email protected]> wrote: > > > Structured streaming's bottom layer also uses a micro-batch mechanism. > It seems that the first batch is slower than the latter, I also often > encounter this problem. It feels related to the division of batches. > Other the other hand, spark's batch size is usually bigger than flume > transaction bache size. > > > KhajaAsmath Mohammed <[email protected]> 于2020年10月21日周三 下午12:19写道: >> Yes. Changing back to latest worked but I still see the slowness compared to >> flume. >> >> Sent from my iPhone >> >>>> On Oct 20, 2020, at 10:21 PM, lec ssmi <[email protected]> wrote: >>>> >>> >>> Do you start your application with chasing the early Kafka data ? >>> >>> Lalwani, Jayesh <[email protected]> 于2020年10月21日周三 上午2:19写道: >>>> Are you getting any output? Streaming jobs typically run forever, and keep >>>> processing data as it comes in the input. If a streaming job is working >>>> well, it will typically generate output at a certain cadence >>>> >>>> >>>> >>>> From: KhajaAsmath Mohammed <[email protected]> >>>> Date: Tuesday, October 20, 2020 at 1:23 PM >>>> To: "user @spark" <[email protected]> >>>> Subject: [EXTERNAL] Spark Structured streaming - Kakfa - slowness with >>>> query 0 >>>> >>>> >>>> >>>> CAUTION: This email originated from outside of the organization. Do not >>>> click links or open attachments unless you can confirm the sender and know >>>> the content is safe. >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> I have started using spark structured streaming for reading data from kaka >>>> and the job is very slow. Number of output rows keeps increasing in query >>>> 0 and the job is running forever. any suggestions for this please? >>>> >>>> >>>> >>>> <image001.png> >>>> >>>> >>>> Thanks, >>>> >>>> Asmath
