Batch load with BigQueryIO fails because of a few bad records.

2021-05-06 Thread Matthew Ouyang
I am loading a batch load of records with BigQueryIO.Write, but because some records don't match the target table schema the entire and the write step fails and nothing gets written to the table. Is there a way for records that do match the target table schema to be inserted, and the records that

Re: Batch load with BigQueryIO fails because of a few bad records.

2021-05-07 Thread Matthew Ouyang
ses/javadoc/2.29.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#skipInvalidRows-- > > On Thu, May 6, 2021 at 18:01 Matthew Ouyang > wrote: > >> I am loading a batch load of records with BigQueryIO.Write, but because >> some records don't match the target

RenameFields behaves differently in DirectRunner

2021-05-31 Thread Matthew Ouyang
I’m trying to use the RenameFields transform prior to inserting into BigQuery on nested fields. Insertion into BigQuery is successful with DirectRunner, but DataflowRunner has an issue with renamed nested fields The error message I’m receiving, : Error while reading data, error message: JSON pars

Re: RenameFields behaves differently in DirectRunner

2021-06-01 Thread Matthew Ouyang
gt;> nestedField.field1_0, suggests the BigQuery is trying to use the >>> original name for the nested field and not the substitute name. >>> >>> Is there a stacktrace associated with this error? It would be helpful to >>> see where the error is coming from. &g

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Matthew Ouyang
the issue If it does, let me know and I'll work > on a fix to RenameFields. > > On Tue, Jun 1, 2021 at 7:39 PM Reuven Lax wrote: > >> Aha, yes this indeed another bug in the transform. The schema is set on >> the top-level Row but not on any nested rows. >

Merging two rows

2021-06-03 Thread Matthew Ouyang
I know there is a method to merge two Beam Schemas into a new Schema. ( https://beam.apache.org/releases/javadoc/2.26.0/org/apache/beam/sdk/schemas/SchemaUtils.html#mergeWideningNullable-org.apache.beam.sdk.schemas.Schema-org.apache.beam.sdk.schemas.Schema- ). Is there a similar method for Beam R

Re: Merging two rows

2021-06-09 Thread Matthew Ouyang
021 at 4:18 PM Reuven Lax wrote: > Do you want them to be flattened, or as two subschemas of a top-level > schema? > > On Thu, Jun 3, 2021 at 12:28 PM Matthew Ouyang > wrote: > >> I know there is a method to merge two Beam Schemas into a new Schema. ( >> https://beam

Dev Setup - Windows 10 + Docker

2021-06-18 Thread Matthew Ouyang
I'm trying to setup a development environment on a Windows 10 machine with the provided Docker image ( https://beam.apache.org/contribute/#container-docker-based). However I'm getting the follow error. It seems like it's unable to download the packages that it needs. Is there additional setup t

Building a Schema from a file

2021-06-18 Thread Matthew Ouyang
I was wondering if there were any tools that would allow me to build a Beam schema from a file? I looked for it in the SDK but I couldn't find anything that could do it.

Re: Building a Schema from a file

2021-06-18 Thread Matthew Ouyang
beam.dataframe.convert.to_pcollection > > On Fri, Jun 18, 2021 at 7:50 AM Reuven Lax wrote: > >> There is a proto format for Beam schemas. You could define it as a proto >> in a file and then parse it. >> >> On Fri, Jun 18, 2021 at 7:28 AM Matthew Ouyang >> wrot

Re: Building a Schema from a file

2021-06-22 Thread Matthew Ouyang
M Brian Hulette wrote: > Are the files in some special format that you need to parse and > understand? Or could you opt to store the schemas as proto descriptors or > Avro avsc? > > On Fri, Jun 18, 2021 at 10:40 AM Matthew Ouyang > wrote: > >> Hello Brian. Thank you f

Re: Building a Schema from a file

2021-06-29 Thread Matthew Ouyang
our you observed). > > https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types > > Best, > -C > > On Tue, Jun 22, 2021 at 11:06 PM Matthew Ouyang > wrote: > >> I am currently using BigQueryUtils to convert a BigQuery TableSchema to a >> Beam Sch

Apply string trim to all fields including structs and arrays

2021-10-15 Thread Matthew Ouyang
I have a Row that has a struct that has string fields and an array of structs which also has string fields. Is there any simple way to apply a whitespace trim to all string fields (like a ** wildcard)? I know FieldAccessDescriptor.withAllFields exists but will that work for anything beyond the to

Re: Apply string trim to all fields including structs and arrays

2021-11-02 Thread Matthew Ouyang
, you might be able to find some examples to > start from: > > https://github.com/apache/beam/tree/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms > > https://github.com/apache/beam/tree/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils &g

Does JavaBeanUtils.getGetters work with List?

2021-12-17 Thread Matthew Ouyang
I needed to convert a POJO to a Row for a unit test(sample code below). Building a Schema from a POJO seems fine, but setting up the values seems to not accept a List of objects. To provide more detail, fieldValues outputs a List instead of a List. The third line in the sample is the same line us

getFailedInsertsWithErr and Storage Write API

2023-03-01 Thread Matthew Ouyang
The documentation says WriteResult.getFailedInserts won’t return anything when used with the Storage Write API ( https://beam.apache.org/documentation/io/built-in/google-bigquery/) Is it the same for WriteResult.getFailedInsertsWithErr?

Successful Inserts for Storage Write API?

2023-03-02 Thread Matthew Ouyang
Thank you to Ahmed and Reuven for the tip on WriteResult:: getFailedStorageApiInserts. When I tried to get the successful inserts through the Storage Write API, I received an error message saying that "Retrieving successful inserts is only supported for streaming inserts. Make sure withSuccessfulI

Re: Successful Inserts for Storage Write API?

2023-03-03 Thread Matthew Ouyang
I was looking for something similar with Storage Write. On Thu, Mar 2, 2023 at 4:48 PM Reuven Lax via user wrote: > Are you trying to do this in order to use Wait.on? getSuccessfulInserts is > not currently supported for Storage Write API. > > On Thu, Mar 2, 2023 at 1:44 PM Matthe

Re: Full-time Job Opportunity - Canada

2024-11-25 Thread Matthew Ouyang
Hello, The link redirects to the general job postings page. I'm guessing the opportunity has been filled? I looked at similarly-titled postings but it looks like they are hybrid positions and I don't live near Kitchener or Vancouver. Are those positions open to people in Canada outside of those