Re: [PR] [SPARK-51362][SQL] Change toJSON to use NextIterator API to eliminate adjacent record dependency [spark]

via GitHub Mon, 03 Mar 2025 05:13:27 -0800


HeartSaVioR commented on PR #50124:
URL: https://github.com/apache/spark/pull/50124#issuecomment-2694352083


   If this could help, maybe I can also explain the edge case I can imagine as 
well.
   
   Let's say, the query reads from Kafka. Since Kafka only supports fetching a 
batch of records from specific offset, there are many cases that a microbatch 
has to fetch from Kafka multiple times. Let's say, the 1st fetch retrieves 
records for offset (N, M), and the 2nd fetch retrieves records for offset (M + 
1, N). If toJSON processes offset M, this will require availability for offset 
M + 1, which could trigger the 2nd fetch, before providing the output of toJSON 
for offset M.
   
   That said, the PR title and description are actually correct. It's just that 
this is more about edge case and I see better rationale of this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] [SPARK-51362][SQL] Change toJSON to use NextIterator API to eliminate adjacent record dependency [spark]

Reply via email to