[
https://issues.apache.org/jira/browse/BEAM-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786293#comment-16786293
]
Valentyn Tymofieiev commented on BEAM-6769:
-------------------------------------------
1. Per
[https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#bytes-type,]
BQ supports Bytes, so Beam BQ IO also should support bytes, or call out that
we don't support Bytes.
2. We should find whether ApiTools BQ client that Beam uses, can accept raw
bytes. If it does, we should find out how to correctly pass raw bytes from the
user to the all the way to BQ client. We may have to avoid json.dumps in this
codepath or do a workaround like: bytes -> base64 encode -> encode to str
using 'ascii' -> json.dumps -> decode from str, decode from base64 -> pass to
BQ client. This may potentially have performance implications.
We should have a test that takes a non-decodable byte-string, such as
b'\xab\xac\xad', and make sure we can store and retrieve it without accidental
decoding by Beam, BQ or BQ client.
cc: [~pabloem], [~chamikara], [~altay]
> Write bytes to BigQuery in Python 3
> -----------------------------------
>
> Key: BEAM-6769
> URL: https://issues.apache.org/jira/browse/BEAM-6769
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-core
> Reporter: Juta Staes
> Assignee: Juta Staes
> Priority: Minor
>
> In Python 2 you could write bytes data to BigQuery. This is tested in
>
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py#L186]
> Python 3 does not support
> {noformat}
> json.dumps({'test': b'test'}){noformat}
> which is used to encode the data in
>
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L959]
>
> How should writing bytes to BigQuery be handled in Python 3?
> * Forbid writing bytes into BigQuery on Python 3
> * Guess the encoding (utf-8?)
> * Pass the encoding to BigQuery
> cc: [~tvalentyn]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)