[ 
https://issues.apache.org/jira/browse/BEAM-6887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135787#comment-17135787
 ] 

Niel Markwick commented on BEAM-6887:
-------------------------------------

https://github.com/apache/beam/pull/11532 changes the defaults for unbounded 
sources so that the high-latency grouping operation is disabled by default. 
In addition https://github.com/apache/beam/pull/11529 simplifies the pipeline 
when there  batching is disabled.

Therefore there is no longer a need for a separate SimpleWrite/WriteBulk

This FR sill applies though for having meaningful output for successful 
mutations. 


> Streaming Spanner Writer transform
> ----------------------------------
>
>                 Key: BEAM-6887
>                 URL: https://issues.apache.org/jira/browse/BEAM-6887
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-gcp
>            Reporter: Niel Markwick
>            Priority: P3
>
> At present, the SpannerIO.Write/WriteGrouped transforms work by collecting an 
> entire bundle of elements, sorts them by table/key, splitting the sorted list 
> into batches (by size and number of cells modified) and then writes each 
> batch to Spanner in a single transaction.
> It returns a SpannerWriteResult.java containing :
>  # a PCollection<Void> (the main output) - which will have no elements but 
> will be closed to signal when all the input elements have been written (which 
> is never in streaming because input is unbounded)
>  # a PCollection<MutationGroup> of elements that failed to write.
> This transform is useful as a bulk sink for data because it efficiently 
> writes large amounts of data. 
> It is not at all useful as an intermediate step in a streaming pipeline - 
> because it has no useful output in streaming mode. 
> I propose that we have a separate Spanner Write transform which simply writes 
> each input Mutation to the database, and then pushes successful Mutations 
> onto its output. 
> This would allow use in the middle of a streaming pipeline, where the flow 
> would be
>  * Some data streamed in
>  * Converted to Spanner Mutations
>  * Written to Spanner Database
>  * Further processing where the values written to the Spanner Database are 
> used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to