>From my investigation the behaviour from 1.0 to 1.6 was to throw on duplicates, but ignore empty duplicates (so allowing pad columns in a CSV).
The allowDuplicates flag was added in 1.7 to allow duplicates. DEFAULT behaviour was true! So the behavioural compatibility was broken in release 1.7. Note there is also an allowMissingColumnNames flag. Logic was: - Store column headers as we go - For each new column header, check in the existing headers - If present: -- If not empty then throw if allowDuplicateHeaderNames=true -- If empty then throw if allowMissingColumnNames=false There is some logic failure here. It would allow the first missing column name for example; only the second empty missing column name would trigger an exception. The current master has updated this logic somewhat but I think the concept is still the same. There are tests to show that the default enum value is ALLOW_ALL. This is the same behaviour as 1.7, but not 1.0 - 1.6. Setting the boolean flag to true -> ALLOW_ALL. Setting the boolean flag to false -> ALLOW_EMPTY. So I think the default settings are correct, and the use of the old flag flips the behaviour correctly. But when investigating I found another issue. I looked at the CSVFormat tests and found two fixtures missing the @Test annotation: testDuplicateHeaderElementsTrue() testDuplicateHeaderElementsTrue_Deprecated() I added these to master. The tests for duplicate headers use the builder and setHeader("A", "A"). So there are no tests for empty headers. Trying this throws an exception in CSVFormat.validate for all of these: CSVFormat.DEFAULT.builder().setAllowDuplicateHeaderNames(false).setHeader("A", "", "B", "").build(); CSVFormat.DEFAULT.builder().setDuplicateHeaderMode(DuplicateHeaderMode.ALLOW_EMPTY).setHeader("A", "", "B", "").build(); CSVFormat.DEFAULT.builder().setAllowDuplicateHeaderNames(false) .setAllowMissingColumnNames(true).setHeader("A", "", "B", "").build(); So it seems the logic when building a CSVFormat with explicit headers is different from when parsing a format in CSVParser.createHeaders(). That method has a lot of logic to set up the header map. If a CSVFormat is created with an explicit header the behaviour is different with respect to duplicate headers, and allowMissingColumnNames which is not even considered in CSVFormat.validate(). This is the current behaviour in master for the parser: @Test public void testEmptyColumnsAndDuplicateHeadersNotAllowed() throws Exception { assertThrows(IllegalArgumentException.class, () -> CSVParser.parse("a,,c,\n1,2,3,4\nw,x,y,z", CSVFormat.DEFAULT.withHeader().withAllowDuplicateHeaderNames(false))); } @Test public void testEmptyColumnsAndDuplicateHeadersNotAllowedWithMissingColumnsAllowed() throws Exception { try (CSVParser parser = CSVParser.parse("a,,c,\n1,2,3,4\nw,x,y,z", CSVFormat.DEFAULT.withHeader().withAllowDuplicateHeaderNames(false).withAllowMissingColumnNames(true))) { // noop } } So the duplicate headers cannot be considered on its own, it requires use of the allow missing column names flag. Also, if a CSVFormat has an explicit header then the CSVParser will not construct a header Map<String, Integer> to allow column names to look up column indices. I am not sure of the best path to fix this. I would document the CSVParser header map as not being created if the header is not read from the CSV input (current behaviour). Then maybe change the logic in CSVFormat.validate() to respect the ALLOW_ALL, ALLOW_EMPTY, ALLOW_NONE duplicates choice, possibly also considering allowEmptyColumnNames which defaults to false. Ideally we should get the same exceptions when building a CSVFormat with explicitly set headers, and parsing a CSV file and getting the header from the file. Currently I do not think that is true. Alex On Fri, 21 Oct 2022 at 15:54, Gary Gregory <garydgreg...@gmail.com> wrote: > > Ok, I don't see any conflict anymore but the PR title does not match the > code: "CSVFormat.duplicateHeaderMode requires default DISALLOW" > > I am ok with the changes as they stand now but we have to resolve if the > default should be changed. > > Gary > > On Fri, Oct 21, 2022, 09:44 <sma...@outlook.de> wrote: > > > Hi Gary, > > > > the PR did not have conflicts 30 minutes ago. So we must have encountered > > a race condition between you updating master and me rebasing and > > force-pushing my pr. > > I've done that again just now. There are no conflicts. > > At the same time I see parts of my PR have dribbled into master already, > > so if you rather implement the changes yourself, please go ahead. I am fine > > with closing my pr. > > > > regards, > > Markus > > > > > > From: Gary Gregory <garydgreg...@gmail.com> > > Sent: Friday, October 21, 2022 15:17 > > To: Commons Developers List <dev@commons.apache.org> > > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > > > Hi Markus, > > > > The PR has conflicts and does not test its changes. > > > > I'd like feedback from the community on whether or not git master as it is > > now is OK for the duplicate header behavior or if changing the default is > > needed and if doing so is just ping-pongonging from one incompatibility to > > another, based on previous versions. > > > > TY! > > > > Gary > > > > On Fri, Oct 21, 2022, 08:52 <sma...@outlook.de> wrote: > > > > > I've removed serialization stuff from the PR and rebased it to master > > > https://github.com/apache/commons-csv/pull/276 > > > > > > kind regards, > > > Markus > > > > > > From: Gary D. Gregory <ggreg...@apache.org> > > > Sent: Friday, October 21, 2022 14:18 > > > To: dev@commons.apache.org <dev@commons.apache.org> > > > Subject: Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1 > > > > > > On 2022/10/20 22:56:05 Alex Herbert wrote: > > > > On Thu, 20 Oct 2022 at 23:43, Alex Herbert <alex.d.herb...@gmail.com> > > > wrote: > > > > > > > > > > I did not have time to track through whether this behaviour changed > > > > > after the initial implementation of the flag. I would think not as > > the > > > > > original behaviour is from 1.0. This would map to: > > > > > > > > > > true -> ALLOW_ALL > > > > > false -> ALLOW_EMPTY > > > > > new -> DISALLOW > > > > > > > > > > Which is what we currently have in 1.10.0 RC1. Thus the PR #276 [7] > > to > > > > > change the use of the flag to 'false -> DISALLOW' is not maintaining > > > > > behavioural compatibility (to 1.7, or back to 1.0). > > > > > > > > PS. I just verified that PR 276 changes the DuplicateHeaderMode value > > > > for allowDuplicates=false and does not change any tests. > > > > > > > > So the test suite is currently not enforcing behavioural > > > > compatibility. This seems like a glaring hole in the tests and should > > > > be addressed to prevent regressions. > > > > > > In git master, there are many tests for senders of > > > withAllowDuplicateHeaderNames(boolean) and > > > setAllowDuplicateHeaderNames(boolean). > > > > > > I filled in the coverage in testGetAllowDuplicateHeaderNames() for > > > getAllowDuplicateHeaderNames() to reflect the _current_ behavior, right > > or > > > wrong. > > > > > > Let's reassess now git master now please. If the PR matters still, it > > > should be rebased on git master. > > > > > > Gary > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org