HeartSaVioR commented on PR #50124: URL: https://github.com/apache/spark/pull/50124#issuecomment-2694352083
If this could help, maybe I can also explain the edge case I can imagine as well. Let's say, the query reads from Kafka. Since Kafka only supports fetching a batch of records from specific offset, there are many cases that a microbatch has to fetch from Kafka multiple times. Let's say, the 1st fetch retrieves records for offset (N, M), and the 2nd fetch retrieves records for offset (M + 1, N). If toJSON processes offset M, this will require availability for offset M + 1, which could trigger the 2nd fetch, before providing the output of toJSON for offset M. That said, the PR title and description are actually correct. It's just that this is more about edge case and I see better rationale of this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org