Re: RenameFields behaves differently in DirectRunner

2021-06-07 Thread Reuven Lax
FYI - this should be fixed by https://github.com/apache/beam/pull/14960 On Thu, Jun 3, 2021 at 10:00 AM Reuven Lax wrote: > Correct. > > On Thu, Jun 3, 2021 at 9:51 AM Kenneth Knowles wrote: > >> I still don't quite grok the details of how this succeeds or fails in >> different situations. The

Re: RenameFields behaves differently in DirectRunner

2021-06-03 Thread Reuven Lax
Correct. On Thu, Jun 3, 2021 at 9:51 AM Kenneth Knowles wrote: > I still don't quite grok the details of how this succeeds or fails in > different situations. The invalid row succeeds in serialization because the > coder is not sensitive to the way in which it is invalid? > > Kenn > > On Wed, Ju

Re: RenameFields behaves differently in DirectRunner

2021-06-03 Thread Kenneth Knowles
I still don't quite grok the details of how this succeeds or fails in different situations. The invalid row succeeds in serialization because the coder is not sensitive to the way in which it is invalid? Kenn On Wed, Jun 2, 2021 at 2:54 PM Brian Hulette wrote: > > One thing that's been on the b

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
> One thing that's been on the back burner for a long time is making CoderProperties into a CoderTester like Guava's EqualityTester. Reuven's point still applies here though. This issue is not due to a bug in SchemaCoder, it's a problem with the Row we gave SchemaCoder to encode. I'm assuming a Co

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Kenneth Knowles
Mutability checking might catch that. I meant to suggest not putting the check in the pipeline, but offering a testing discipline that will catch such issues. One thing that's been on the back burner for a long time is making CoderProperties into a CoderTester like Guava's EqualityTester. Then it

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
Could the DirectRunner just do an equality check whenever it does an encode/decode? It sounds like it's already effectively performing a CoderProperties.coderDecodeEncodeEqual for every element, just omitting the equality check. On Wed, Jun 2, 2021 at 12:04 PM Reuven Lax wrote: > There is no bug

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Reuven Lax
There is no bug in the Coder itself, so that wouldn't catch it. We could insert CoderProperties.coderDecodeEncodeEqual in a subsequent ParDo, but if the Direct runner already does an encode/decode before that ParDo, then that would have fixed the problem before we could see it. On Wed, Jun 2, 2021

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Kenneth Knowles
Would it be caught by CoderProperties? Kenn On Wed, Jun 2, 2021 at 8:16 AM Reuven Lax wrote: > I don't think this bug is schema specific - we created a Java object that > is inconsistent with its encoded form, which could happen to any transform. > > This does seem to be a gap in DirectRunner t

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Reuven Lax
I don't think this bug is schema specific - we created a Java object that is inconsistent with its encoded form, which could happen to any transform. This does seem to be a gap in DirectRunner testing though. It also makes it hard to test using PAssert, as I believe that puts everything in a side

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Brian Hulette
+dev > I bet the DirectRunner is encoding and decoding in between, which fixes the object. Do we need better testing of schema-aware (and potentially other built-in) transforms in the face of fusion to root out issues like this? Brian On Wed, Jun 2, 2021 at 5:13 AM Matthew Ouyang wrote: > I

Re: RenameFields behaves differently in DirectRunner

2021-06-02 Thread Matthew Ouyang
I have some other work-related things I need to do this week, so I will likely report back on this over the weekend. Thank you for the explanation. It makes perfect sense now. On Tue, Jun 1, 2021 at 11:18 PM Reuven Lax wrote: > Some more context - the problem is that RenameFields outputs (in t

Re: RenameFields behaves differently in DirectRunner

2021-06-01 Thread Reuven Lax
Some more context - the problem is that RenameFields outputs (in this case) Java Row objects that are inconsistent with the actual schema. For example if you have the following schema: Row { field1: Row { field2: string } } And rename field1.field2 -> renamed, you'll get the followin

Re: RenameFields behaves differently in DirectRunner

2021-06-01 Thread Reuven Lax
Aha, yes this indeed another bug in the transform. The schema is set on the top-level Row but not on any nested rows. On Tue, Jun 1, 2021 at 6:37 PM Matthew Ouyang wrote: > Thank you everyone for your input. I believe it will be easiest to > respond to all feedback in a single message rather th

Re: RenameFields behaves differently in DirectRunner

2021-06-01 Thread Matthew Ouyang
Thank you everyone for your input. I believe it will be easiest to respond to all feedback in a single message rather than messages per person. - NeedsRunner - The tests are run eventually, so obviously all good on my end. I was trying to run the smallest subset of test cases possible and

Re: RenameFields behaves differently in DirectRunner

2021-06-01 Thread Reuven Lax
This transform is the same across all runners. A few comments on the test: - Using attachValues directly is error prone (per the comment on the method). I recommend using the withFieldValue builders instead. - I recommend capturing the RenameFields PCollection into a local variable of type PCo

Re: RenameFields behaves differently in DirectRunner

2021-06-01 Thread Kenneth Knowles
On Tue, Jun 1, 2021 at 12:42 PM Brian Hulette wrote: > Hi Matthew, > > > The unit tests also seem to be disabled for this as well and so I don’t > know if the PTransform behaves as expected. > > The exclusion for NeedsRunner tests is just a quirk in our testing > framework. NeedsRunner indicates

Re: RenameFields behaves differently in DirectRunner

2021-06-01 Thread Brian Hulette
Hi Matthew, > The unit tests also seem to be disabled for this as well and so I don’t know if the PTransform behaves as expected. The exclusion for NeedsRunner tests is just a quirk in our testing framework. NeedsRunner indicates that a test suite can't be executed with the SDK alone, it needs a