Re: Structured Streaming Microbatch Semantics

2021-03-08 Thread Mich Talebzadeh
BTW what you pickup when you start the job depends on the setting in readStream: .option("startingOffsets", "*latest*") \ in my previous example I had it "*earliest*" so setting it to "latest" will result in starting from the latest topic arrival as shown below None 0 DataFrame

Re: Structured Streaming Microbatch Semantics

2021-03-08 Thread Mich Talebzadeh
Ok thanks for the diagram. So you have ~ 30 seconds duration of each bach as in foreachBatch and 60 rows per batch Back to your question: "My question is now: Is it guaranteed by Spark that all output records of one event are always contained in a single batch or can the records also be split in

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Mich Talebzadeh
Hi Rico, Would it be possible for you to provide a snapshot of Structured Streaming Tab (from Spark GUI) if possible? Thanks Mich LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
Hi! Thanks for your reply! For several reasons we don't want to "pipe" the real data through Kafka. What may be a problem arising from this approach? Best, Rico. Am 05.03.2021 um 09:18 schrieb Roland Johann: Hi Rico, there is no way to deferr records from one micro batch to the next one.

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
Hi! As abstract code what I do in my streaming program is: readStream() //from Kafka .flatMap(readIngestionDatasetViaREST) //can return thousands of records for a single event .writeStream.outputMode("append").foreachBatch(upsertIntoDeltaTable).start() I don't use triggers but I limit the

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Mich Talebzadeh
Hi Ricco, Just to clarify, your batch interval may have a variable number of rows sent to Kafka topic for each event? In your writeStream code writeStream. \ outputMode('append'). \ option("truncate", "false"). \

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Roland Johann
Hi Rico, there is no way to deferr records from one micro batch to the next one. So it‘s guaranteed that the data and trigger event will be processed within the dame batch. I assume that one trigger event lead to an unknown batch size of actual events pulled via HTTP. This bypasses throughput pro