Hello Everyone,
*For those new to Beam, even if this is your first day, consider yourselves
a welcome contributor to this conversation. Below are
definitions/references and a suggested learning path to understand this
email.*
Proposal | Java
For Java, I would like to add the following to PubsubClient [1].
import org.apache.beam.sdk.schemas.Schema;
public Schema getSchema(SchemaPath schemaPath) throws IOException;
public static class SchemaPath { /* Supports the
projects/<project>/schemas/<schema> resource path[2]. */ }
Additionally, I would like to propose two static helper methods to support
the PubsubGrpcClient [3], and PubsubJsonClient [4].
static Schema fromPubsubSchema(com.google.api.services.pubsub.model.Schema
pubsubSchema) { /* Converts Pub/Sub model Schema to Beam Schema; for use by
PubsubJsonClient. */ }
static Schema fromPubsubSchema(com.google.pubsub.v1.Schema pubsubSchema) {
/* Converts Pub/Sub model Schema to Beam Schema; for use by
PubsubGrpcClient. */ }
Finally, to support tests, I would like to add this new Schema feature
in PubsubTestClient's [5] private State class.
import org.apache.beam.sdk.schemas.Schema;
private static class State {
/** Expected Pub/Sub mapped Beam Schema. */
@Nullable Schema schema;
}
Proposal | Go
For Go, I would like to add the following to pubsubx [6].
func GetSchema(ctx context.Context, client *pubsub.SchemaClient, schemaId
string) (*pubsub.SchemaConfig, error) { ... }
func EncodeSchema(dst reflect.Type, src *pubsub.SchemaConfig) ([]byte,
error) { ... }
Rationale
Querying from and converting the Pub/Sub Schema [7] to a Beam Schema[8]
would allow us to validate that both schemas match to prevent potential
errors. This supports the design goals of Pub/Sub schemas to facilitate a
contract between publisher and subscriber and facilitate a single source of
truth for inter-team production and consumption. This feature surfaced
while implementing work related to Pub/Sub and Beam Schemas whose detail is
excluded from this email.
Definitions/References
[1] PubsubClient: An (abstract) helper class for talking to Pubsub via an
underlying transport.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.html
[2] *Resource Path*: Google Cloud resource naming adheres to the Google API
design guide using a '/' delimited pattern.
https://cloud.google.com/apis/design/resource_names
[3] PubsubGrpcClient: An implementation of PubsubClient [1] using gRPC.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.html
[4] *PubsubJsonClient*: An implementation of PubsubClient [1] using JSON
transport.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.html
[5] PubsubTestClient: A partial implementation of PubsubClient [1] for use
by unit tests.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClient.html
[6] pubsubx: The Beam Go SDK's package contains utilities for working with
Pub/Sub.
https://pkg.go.dev/github.com/apache/beam/sdks/[email protected]/go/pkg/beam/util/pubsubx
[7] *Pub/Sub Schema*: A format to which Pub/Sub data must adhere. It
facilitates a contract between publisher and subscriber that Pub/Sub will
enforce.
https://cloud.google.com/pubsub/docs/schemas
[8] *Beam Schema*: An object that describes Beam data elements such as
field names and their data types.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
Suggested Learning Path To Understand This Email
1. *What is Pub/Sub?* -
https://www.youtube.com/playlist?list=PLIivdWyY5sqKwVLe4BLJ-vlh9r9zCdOse
2. *What is Apache Beam?* -
https://www.youtube.com/watch?v=65lmwL7rSy4&t=223s
3. *Apache Beam Overview* -
https://beam.apache.org/documentation/programming-guide/#overview
4. *Transforms (Up to section 4.1)* -
https://beam.apache.org/documentation/programming-guide/#transforms
5. *Pipeline I/O* -
https://beam.apache.org/documentation/programming-guide/#pipeline-io
6. Schemas -
https://beam.apache.org/documentation/programming-guide/#schemas