Hi, I am working on adding schema support to BigQuery reads (BEAM-6673) and I am a bit confused by two contradictory code paths that deal with ARRAY type fields in TableRow objects.
The TableRowParser implementation in BigQueryIO ultimately calls BigQueryAvroUtils#convertRepeatedField ( https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java#L214) and that code is simply treating ARRAY types as lists containing objects of the underlying element type. This is congruent with the documentation I have found [1]. However, when I look at the code to convert a TableRow to a Beam row ( https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java#L315), it is expecting ARRAY type fields to contain a List of Maps where each Map contains an entry with the key "v" and the underlying element type of the array. I think that this nested Map representation for arrays of scalar types is not correct and I would really appreciate it if someone knowledgeable with BigQuery internals could chime in to confirm whether I am right or wrong. (All the unit tests pass even after I comment out the Map value extraction in line 323 but that is not a confirmation of the fact.) Thank you. [1] I could not find any official documentation about the JSON format of BigQuery rows in the API docs but this seems to be the best description of it: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#to_json_string. This description matches the JSON output produced by the BigQuery query editor.