Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

Gary D. Gregory Sun, 23 Oct 2022 06:09:48 -0700

Hi All, Alex, more below:

On 2022/10/22 21:23:13 Alex Herbert wrote:
> On Sat, 22 Oct 2022 at 20:05, Gary D. Gregory <ggreg...@apache.org> wrote:
> >
> > Thank you for the new tests Alex!
> >
> > Here is one area that is easy to overlook: As Commons CSV has evolved, _not 
> > all settings in CSVFormat_ apply to both writing and parsing. To wit, the 
> > existing Javadoc for CSVFormat.getAllowMissingColumnNames(): "Gets whether 
> > missing column names are allowed when parsing the header line."
> >
> > I've updated the Javadoc for 
> > CSVFormat.Builder.setAllowMissingColumnNames(boolean) to state this is a 
> > parser setting.
> >
> > I've also updated the test since the commented-out test data is valid for 
> > parsing.
> >
> > In git master now.
> 
> OK, thanks for the explanation. If the CSVFormat ignores the
> allowEmptyColumnsNames in the validate() method it should act as if
> the setting is true, i.e. only the DuplicateHeaderMode should be used
> to validate the header.


That's how it is now, allowEmptyColumnsNames cannot be considered in the 
validate method because a CSVFormat does not know if it is going to be used for 
parsing or writing.

So the test can be updated to stream the full
> set of cases (reinstating those you commented out) and filter to those
> where the flag is set to true, e.g. using:

I already had added a new test data provider method to do that so, all I had to 
do now is make sure that when I deleted the commented out test elements, these 
were covered in the 2nd data provider called duplicateHeaderParseOnlyData().

> 
> ---
> 
> static Stream<Arguments> duplicateHeaderAllowsMissingColumnsNamesData() {
>     return duplicateHeaderData().filter(arg ->
> Boolean.TRUE.equals(arg.get()[1]));
> }

I did not do this because it seemed (to me) clearer (less magical) to create a 
duplicateHeaderParseOnlyData() and list explicitly test cases there. WDYT?

> 
> @ParameterizedTest
> @MethodSource(value = {"duplicateHeaderAllowsMissingColumnsNamesData"})
> public void testCSVFormat(final DuplicateHeaderMode duplicateHeaderMode, ...
> 
> ---
> 
> This finds the following failure:
> 
> Arguments.of(DuplicateHeaderMode.ALLOW_EMPTY, true,  new String[] {"
> ", " "}, true),
> 
> This is due to the difference in the CSVParser and CSVFormat in the
> use of the trim() method to detect an empty header:
> 
> CSVParser:
> final boolean emptyHeader = header == null || header.trim().isEmpty();
> 
> CSVFormat:
> final boolean empty = header == null || header.isEmpty();
> 
> This means that the definition of 'empty' is different for the
> CSVFormat and the CSVParser. Should this be corrected in CSVFormat to
> use trim()?

I refactored the logic in CSVFormat#isBlank(String) where the "blank" concept 
vs. "empty" is the same as in Apache Commons Lang where "empty" means size 0 
and "blank" means "trims to size 0"; see 
org.apache.commons.lang3.StringUtils.isBlank(CharSequence).

> 
> If so then I can push the updated test and fix to master. That would
> remove the requirement to comment out any of the test cases and moving
> those to a different test stream method.

Ah, well, let's have you review git master now and feel free to refactor. I 
think we are close if not done for another RC. WDYT?

TY!
Gary

> 
> Alex
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [VOTE] Release Apache Commons CSV 1.10.0 based on RC1

Reply via email to