caldempsey commented on code in PR #152:
URL: https://github.com/apache/spark-connect-go/pull/152#discussion_r2869327265
##########
spark/sql/dataframe.go:
##########
@@ -936,6 +942,17 @@ func (df *dataFrameImpl) ToArrow(ctx context.Context)
(*arrow.Table, error) {
return &table, nil
}
+func (df *dataFrameImpl) StreamRows(ctx context.Context) (types.RowPull2,
error) {
+ responseClient, err := df.session.client.ExecutePlan(ctx,
df.createPlan())
+ if err != nil {
+ return nil, sparkerrors.WithType(fmt.Errorf("failed to execute
plan: %w", err), sparkerrors.ExecutionError)
+ }
+
+ seq2 := responseClient.ToRecordSequence(ctx)
+
+ return types.NewRowPull2(ctx, seq2), nil
Review Comment:
So, as I mentioned briefly, this is my first time using Go iterators. I was
originally inspired to build a `Pull` abstraction from the [Go range functions
blog post](https://go.dev/blog/range-functions#pull-iterators) to give callers
on-demand 'pull' semantics over the tuple. I used that as my 'north star', so
to speak.
But yeah, I think the `iter.Pull2` conversion here is overengineering:
looking at this a few months down the line this is just creating a
push→pull→push, and so a roundtrip that seems completely redundant and less
concise. Seems the underlying gRPC stream is inherently single-use, so we don't
need to guard termination either.
I'll deal with this by taking the useful part, the EOF handler and bubble it
directly into `NewRowSequence` and drop the `iter.Pull2` conversion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]