Adding @Yi Hu <ya...@google.com> who might know more about the expected behavior.
If you see a gap, feel free to fix it with a PR. And thank you for your contributions! On Wed, Oct 9, 2024 at 12:06 AM LDesire <two_som...@icloud.com> wrote: > In the CsvIO.parseRows method, does it matter if the number of CSV headers > is not the same as the number of fields in the Schema? > > I'm looking at that method and I don't see any logic anywhere that > validates this. > > I've looked for related tests, but they don't seem to be validated > properly. > > ``` > > @Test > *public void *givenMismatchedCsvFormatAndSchema_throws() { > Pipeline pipeline = Pipeline.*create*(); > CSVFormat csvFormat = > CSVFormat.*DEFAULT* > .withHeader(‘a_string’, ‘an_integer’, ‘a_double’) > .withAllowDuplicateHeaderNames(*true*); > Schema schema = > Schema.*builder*().addStringField(‘a_string’).addDoubleField(‘a_double’).build(); > *assertThrows*(IllegalArgumentException.*class*, () -> > CsvIO.*parseRows*(schema, csvFormat)); > pipeline.run(); > } > > ``` > > The above test always passes the assertThrows test because > withAllowDuplicateHeaderNames is true. > > In other words, it doesn't seem to be validating properly because the > exception is thrown in a different part of the test than intended. > > If this is unintended, would it be okay if I add logic to validate that > the number of CSV headers is the same as the number of fields in the Schema? >