Re: [PR] [SPARK-52780] Add StreamRows and Arrow Record Streaming [spark-connect-go]

via GitHub Sun, 01 Mar 2026 08:16:00 -0800


caldempsey commented on code in PR #152:
URL: https://github.com/apache/spark-connect-go/pull/152#discussion_r2869327265



##########
spark/sql/dataframe.go:
##########
@@ -936,6 +942,17 @@ func (df *dataFrameImpl) ToArrow(ctx context.Context) 
(*arrow.Table, error) {
        return &table, nil
 }
 
+func (df *dataFrameImpl) StreamRows(ctx context.Context) (types.RowPull2, 
error) {
+       responseClient, err := df.session.client.ExecutePlan(ctx, 
df.createPlan())
+       if err != nil {
+               return nil, sparkerrors.WithType(fmt.Errorf("failed to execute 
plan: %w", err), sparkerrors.ExecutionError)
+       }
+
+       seq2 := responseClient.ToRecordSequence(ctx)
+
+       return types.NewRowPull2(ctx, seq2), nil

Review Comment:
   So, as I mentioned briefly, this is my first time using Go iterators. I was 
originally inspired  to build a `Pull` abstraction from the [Go range functions 
blog post](https://go.dev/blog/range-functions#pull-iterators) to give callers 
on-demand 'pull' semantics over the tuple. I used that as my 'north star', so 
to speak. 
   
   But yeah, I think the `iter.Pull2` conversion here is overengineering: 
looking at this a few months down the line this is just creating a 
push→pull→push, and so a roundtrip that seems completely redundant and less 
concise. Seems the underlying gRPC stream is inherently single-use, so we don't 
need to guard termination either.
   
   I'll deal with this by taking the useful part, the EOF handler and bubble it 
directly into `NewRowSequence` and drop the `iter.Pull2` conversion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52780] Add StreamRows and Arrow Record Streaming [spark-connect-go]

Reply via email to