Re: svn commit: r1635052 - in /commons/proper/csv/trunk/src: changes/ main/java/org/apache/commons/csv/ test/java/org/apache/commons/csv/

Gary Gregory Wed, 29 Oct 2014 10:02:34 -0700

This should be something simple to use, it's just a parser for a 'simple'
file format. I get a format, a parser and my data. If each new feature is a
decorator I'm going to end up creating 5 classes to use 5 features, no
thanks. Like you said, it could be a matter of personal taste. Over at
HttpComponents HttpClient, the decorators have been deprecated in favor of
builders. Here at CSV, we have a builder for CSV formats. So maybe that
should be used, at least it would be consistent. Also, until someone tries
to create a decorator, it's just talk, and from one of the comments, it
might not be doable...


Gary

On Wed, Oct 29, 2014 at 12:39 PM, Adrian Crum <
adrian.c...@sandglass-software.com> wrote:

> Generally speaking, good designs are not based on class count.
>
> If we simply add methods to the parser every time a user comes up with a
> new use case, then eventually we will end up with a single class that tries
> to be all things to all people.
>
> The decorator pattern is ideally suited for situations like this:
>
> 1. I need a basic CSV parser: Use the basic parser.
> 2. I need a CSV parser that supports random access: Use basic parser
> decorated by Random Access decorator.
> 3. I need a CSV parser that supports foo: Use basic parser decorated by
> Foo decorator.
> 4. I need...
>
> From my perspective, that makes client code cleaner and simpler, not more
> complicated.
>
> But like many design discussions, that is a personal preference and others
> may feel differently.
>
> Adrian Crum
> Sandglass Software
> www.sandglass-software.com
>
> On 10/29/2014 4:08 PM, Gary Gregory wrote:
>
>> On Wed, Oct 29, 2014 at 4:13 AM, Benedikt Ritter <brit...@apache.org>
>> wrote:
>>
>>  Hi Gary,
>>>
>>> what do you think about the Decorator approach, suggested by Adrian Crum
>>> in
>>> JIRA?
>>>
>>>
>> In theory, it's OK, but in this specific case, it does not seem to me
>> worth
>> the cost (an extra class) and complexity (it makes the client code more
>> complicated). It is also not clear that it would be easy to do based on
>> the
>> other comments in the JIRA. I do not plan on researching this path but I
>> suppose I would examine a patch.
>>
>> Gary
>>
>>
>>
>>> Benedikt
>>>
>>> 2014-10-29 6:44 GMT+01:00 <ggreg...@apache.org>:
>>>
>>>  Author: ggregory
>>>> Date: Wed Oct 29 05:44:40 2014
>>>> New Revision: 1635052
>>>>
>>>> URL: http://svn.apache.org/r1635052
>>>> Log:
>>>> [CSV-131] Save positions of records to enable random access. The floor
>>>> is
>>>> open for code review and further discussion based on the comments in the
>>>> Jira.
>>>>
>>>> Modified:
>>>>      commons/proper/csv/trunk/src/changes/changes.xml
>>>>
>>>>
>>>>  commons/proper/csv/trunk/src/main/java/org/apache/commons/
>>> csv/CSVParser.java
>>>
>>>>
>>>>
>>>>  commons/proper/csv/trunk/src/main/java/org/apache/commons/
>>> csv/CSVRecord.java
>>>
>>>>
>>>>
>>>>  commons/proper/csv/trunk/src/test/java/org/apache/commons/
>>> csv/CSVParserTest.java
>>>
>>>>
>>>>
>>>>  commons/proper/csv/trunk/src/test/java/org/apache/commons/
>>> csv/CSVRecordTest.java
>>>
>>>>
>>>> Modified: commons/proper/csv/trunk/src/changes/changes.xml
>>>> URL:
>>>>
>>>>  http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/
>>> changes/changes.xml?rev=1635052&r1=1635051&r2=1635052&view=diff
>>>
>>>>
>>>>
>>>>  ============================================================
>>> ==================
>>>
>>>> --- commons/proper/csv/trunk/src/changes/changes.xml (original)
>>>> +++ commons/proper/csv/trunk/src/changes/changes.xml Wed Oct 29
>>>> 05:44:40
>>>> 2014
>>>> @@ -39,6 +39,7 @@
>>>>     </properties>
>>>>     <body>
>>>>       <release version="1.1" date="2014-mm-dd" description="Feature and
>>>>
>>> bug
>>>
>>>> fix release">
>>>> +      <action issue="CSV-131" type="add" dev="ggregory" due-to="Holger
>>>> Stratmann">Save positions of records to enable random access</action>
>>>>         <action issue="CSV-130" type="fix" dev="ggregory" due-to="Sergei
>>>> Lebedev">CSVFormat#withHeader doesn't work well with #printComment, add
>>>> withHeaderComments(String...)</action>
>>>>         <action issue="CSV-128" type="fix"
>>>> dev="ggregory">CSVFormat.EXCEL
>>>> should ignore empty header names</action>
>>>>         <action issue="CSV-129" type="add" dev="ggregory">Add
>>>> CSVFormat#with 0-arg methods matching boolean arg methods</action>
>>>>
>>>> Modified:
>>>>
>>>>  commons/proper/csv/trunk/src/main/java/org/apache/commons/
>>> csv/CSVParser.java
>>>
>>>> URL:
>>>>
>>>>  http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/
>>> main/java/org/apache/commons/csv/CSVParser.java?rev=
>>> 1635052&r1=1635051&r2=1635052&view=diff
>>>
>>>>
>>>>
>>>>  ============================================================
>>> ==================
>>>
>>>> ---
>>>>
>>>>  commons/proper/csv/trunk/src/main/java/org/apache/commons/
>>> csv/CSVParser.java
>>>
>>>> (original)
>>>> +++
>>>>
>>>>  commons/proper/csv/trunk/src/main/java/org/apache/commons/
>>> csv/CSVParser.java
>>>
>>>> Wed Oct 29 05:44:40 2014
>>>> @@ -219,6 +219,12 @@ public final class CSVParser implements
>>>>       private final List<String> record = new ArrayList<String>();
>>>>
>>>>       private long recordNumber;
>>>> +
>>>> +    /**
>>>> +     * Lexer offset if the parser does not start parsing at the
>>>>
>>> beginning
>>>
>>>> of the source. Usually used in combination
>>>> +     * with {@link #setNextRecordNumber(long)}
>>>> +     */
>>>> +    private long characterOffset;
>>>>
>>>>       private final Token reusableToken = new Token();
>>>>
>>>> @@ -296,6 +302,43 @@ public final class CSVParser implements
>>>>       }
>>>>
>>>>       /**
>>>> +     * Sets the record number to be assigned to the next record read.
>>>> +     * <p>
>>>> +     * Use this if the reader is not positioned at the first record
>>>> when
>>>> you create the parser. For example, the first
>>>> +     * record read might be the 51st record in the source file.
>>>> +     * </p>
>>>> +     * <p>
>>>> +     * If you want the records to also have the correct character
>>>> position referring to the underlying source, call
>>>> +     * {@link #setNextCharacterPosition(long)}.
>>>> +     * </p>
>>>> +     *
>>>> +     * @param nextRecordNumber
>>>> +     *            the next record number
>>>> +     * @since 1.1
>>>> +     */
>>>> +    public void setNextRecordNumber(long nextRecordNumber) {
>>>> +        this.recordNumber = nextRecordNumber - 1;
>>>> +    }
>>>> +
>>>> +    /**
>>>> +     * Sets the current position in the source stream regardless of
>>>>
>>> where
>>>
>>>> the parser and lexer start reading.
>>>> +     * <p>
>>>> +     * For example: We open a file and seek to position 5434 in order
>>>> to
>>>> start reading at record 42. In order to have
>>>> +     * the parser assign the correct characterPosition to records, we
>>>> call this method.
>>>> +     * </p>
>>>> +     * <p>
>>>> +     * If you want the records to also have the correct record numbers,
>>>> call {@link #setNextRecordNumber(long)}
>>>> +     * </p>
>>>> +     *
>>>> +     * @param position
>>>> +     *            the new character position
>>>> +     * @since 1.1
>>>> +     */
>>>> +    public void setNextCharacterPosition(long position) {
>>>> +        this.characterOffset = position - lexer.getCharacterPosition();
>>>> +    }
>>>> +
>>>> +    /**
>>>>        * Returns the current record number in the input stream.
>>>>        *
>>>>        * <p>
>>>> @@ -445,6 +488,7 @@ public final class CSVParser implements
>>>>           CSVRecord result = null;
>>>>           this.record.clear();
>>>>           StringBuilder sb = null;
>>>> +        final long startCharPosition = lexer.getCharacterPosition() +
>>>> this.characterOffset;
>>>>           do {
>>>>               this.reusableToken.reset();
>>>>               this.lexer.nextToken(this.reusableToken);
>>>> @@ -480,7 +524,7 @@ public final class CSVParser implements
>>>>               this.recordNumber++;
>>>>               final String comment = sb == null ? null : sb.toString();
>>>>               result = new CSVRecord(this.record.toArray(new
>>>> String[this.record.size()]), this.headerMap, comment,
>>>> -                    this.recordNumber);
>>>> +                    this.recordNumber, startCharPosition);
>>>>           }
>>>>           return result;
>>>>       }
>>>>
>>>> Modified:
>>>>
>>>>  commons/proper/csv/trunk/src/main/java/org/apache/commons/
>>> csv/CSVRecord.java
>>>
>>>> URL:
>>>>
>>>>  http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/
>>> main/java/org/apache/commons/csv/CSVRecord.java?rev=
>>> 1635052&r1=1635051&r2=1635052&view=diff
>>>
>>>>
>>>>
>>>>  ============================================================
>>> ==================
>>>
>>>> ---
>>>>
>>>>  commons/proper/csv/trunk/src/main/java/org/apache/commons/
>>> csv/CSVRecord.java
>>>
>>>> (original)
>>>> +++
>>>>
>>>>  commons/proper/csv/trunk/src/main/java/org/apache/commons/
>>> csv/CSVRecord.java
>>>
>>>> Wed Oct 29 05:44:40 2014
>>>> @@ -36,6 +36,8 @@ public final class CSVRecord implements
>>>>
>>>>       private static final long serialVersionUID = 1L;
>>>>
>>>> +    private final long characterPosition;
>>>> +
>>>>       /** The accumulated comments (if any) */
>>>>       private final String comment;
>>>>
>>>> @@ -44,15 +46,16 @@ public final class CSVRecord implements
>>>>
>>>>       /** The record number. */
>>>>       private final long recordNumber;
>>>> -
>>>> +
>>>>       /** The values of the record */
>>>>       private final String[] values;
>>>>
>>>> -    CSVRecord(final String[] values, final Map<String, Integer>
>>>> mapping,
>>>> final String comment, final long recordNumber) {
>>>> +    CSVRecord(final String[] values, final Map<String, Integer>
>>>> mapping,
>>>> final String comment, final long recordNumber, long characterPosition) {
>>>>           this.recordNumber = recordNumber;
>>>>           this.values = values != null ? values : EMPTY_STRING_ARRAY;
>>>>           this.mapping = mapping;
>>>>           this.comment = comment;
>>>> +        this.characterPosition = characterPosition;
>>>>       }
>>>>
>>>>       /**
>>>> @@ -110,6 +113,16 @@ public final class CSVRecord implements
>>>>       }
>>>>
>>>>       /**
>>>> +     * Returns the start position of this record as a character
>>>> position
>>>> in the source stream. This may or may not
>>>> +     * correspond to the byte position depending on the character set.
>>>> +     *
>>>> +     * @return the position of this record in the source stream.
>>>> +     */
>>>> +    public long getCharacterPosition() {
>>>> +        return characterPosition;
>>>> +    }
>>>> +
>>>> +    /**
>>>>        * Returns the comment for this record, if any.
>>>>        *
>>>>        * @return the comment for this record, or null if no comment for
>>>> this record is available.
>>>>
>>>> Modified:
>>>>
>>>>  commons/proper/csv/trunk/src/test/java/org/apache/commons/
>>> csv/CSVParserTest.java
>>>
>>>> URL:
>>>>
>>>>  http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/
>>> test/java/org/apache/commons/csv/CSVParserTest.java?rev=
>>> 1635052&r1=1635051&r2=1635052&view=diff
>>>
>>>>
>>>>
>>>>  ============================================================
>>> ==================
>>>
>>>> ---
>>>>
>>>>  commons/proper/csv/trunk/src/test/java/org/apache/commons/
>>> csv/CSVParserTest.java
>>>
>>>> (original)
>>>> +++
>>>>
>>>>  commons/proper/csv/trunk/src/test/java/org/apache/commons/
>>> csv/CSVParserTest.java
>>>
>>>> Wed Oct 29 05:44:40 2014
>>>> @@ -299,22 +299,23 @@ public class CSVParserTest {
>>>>           }
>>>>       }
>>>>
>>>> -//    @Test
>>>> -//    public void testStartWithEmptyLinesThenHeaders() throws
>>>> Exception
>>>>
>>> {
>>>
>>>> -//        final String[] codes = { "\r\n\r\n\r\nhello,\r\n\r\n\r\n",
>>>> "hello,\n\n\n", "hello,\"\"\r\n\r\n\r\n", "hello,\"\"\n\n\n" };
>>>> -//        final String[][] res = { { "hello", "" }, { "" }, // Excel
>>>> format does not ignore empty lines
>>>> -//                { "" } };
>>>> -//        for (final String code : codes) {
>>>> -//            final CSVParser parser = CSVParser.parse(code,
>>>> CSVFormat.EXCEL);
>>>> -//            final List<CSVRecord> records = parser.getRecords();
>>>> -//            assertEquals(res.length, records.size());
>>>> -//            assertTrue(records.size() > 0);
>>>> -//            for (int i = 0; i < res.length; i++) {
>>>> -//                assertArrayEquals(res[i], records.get(i).values());
>>>> -//            }
>>>> -//            parser.close();
>>>> -//        }
>>>> -//    }
>>>> +    // @Test
>>>> +    // public void testStartWithEmptyLinesThenHeaders() throws
>>>>
>>> Exception {
>>>
>>>> +    // final String[] codes = { "\r\n\r\n\r\nhello,\r\n\r\n\r\n",
>>>> "hello,\n\n\n", "hello,\"\"\r\n\r\n\r\n",
>>>> +    // "hello,\"\"\n\n\n" };
>>>> +    // final String[][] res = { { "hello", "" }, { "" }, // Excel
>>>> format
>>>> does not ignore empty lines
>>>> +    // { "" } };
>>>> +    // for (final String code : codes) {
>>>> +    // final CSVParser parser = CSVParser.parse(code, CSVFormat.EXCEL);
>>>> +    // final List<CSVRecord> records = parser.getRecords();
>>>> +    // assertEquals(res.length, records.size());
>>>> +    // assertTrue(records.size() > 0);
>>>> +    // for (int i = 0; i < res.length; i++) {
>>>> +    // assertArrayEquals(res[i], records.get(i).values());
>>>> +    // }
>>>> +    // parser.close();
>>>> +    // }
>>>> +    // }
>>>>
>>>>       @Test
>>>>       public void testEndOfFileBehaviorCSV() throws Exception {
>>>> @@ -475,6 +476,16 @@ public class CSVParserTest {
>>>>       }
>>>>
>>>>       @Test
>>>> +    public void testGetRecordPositionWithCRLF() throws Exception {
>>>> +        this.validateRecordPosition(CRLF);
>>>> +    }
>>>> +
>>>> +    @Test
>>>> +    public void testGetRecordPositionWithLF() throws Exception {
>>>> +        this.validateRecordPosition(String.valueOf(LF));
>>>> +    }
>>>> +
>>>> +    @Test
>>>>       public void testGetOneLine() throws IOException {
>>>>           final CSVParser parser = CSVParser.parse(CSV_INPUT_1,
>>>> CSVFormat.DEFAULT);
>>>>           final CSVRecord record = parser.getRecords().get(0);
>>>> @@ -902,4 +913,65 @@ public class CSVParserTest {
>>>>           parser.close();
>>>>       }
>>>>
>>>> +    private void validateRecordPosition(final String lineSeparator)
>>>> throws IOException {
>>>> +        final String nl = lineSeparator; // used as linebreak in values
>>>> for better distinction
>>>> +
>>>> +        String code = "a,b,c" + lineSeparator + "1,2,3" + lineSeparator
>>>>
>>> +
>>>
>>>> +        // to see if recordPosition correctly points to the enclosing
>>>> quote
>>>> +                "'A" + nl + "A','B" + nl + "B',CC" + lineSeparator +
>>>> +                // unicode test... not very relevant while operating on
>>>> strings instead of bytes, but for
>>>> +                // completeness...
>>>> +                "\u00c4,\u00d6,\u00dc" + lineSeparator + "EOF,EOF,EOF";
>>>> +
>>>> +        final CSVFormat format =
>>>>
>>>>  CSVFormat.newFormat(',').withQuote('\'').withRecordSeparator(
>>> lineSeparator);
>>>
>>>> +        CSVParser parser = CSVParser.parse(code, format);
>>>> +
>>>> +        CSVRecord record;
>>>> +        assertEquals(0, parser.getRecordNumber());
>>>> +
>>>> +        assertNotNull(record = parser.nextRecord());
>>>> +        assertEquals(1, record.getRecordNumber());
>>>> +        assertEquals(code.indexOf('a'), record.getCharacterPosition())
>>>> ;
>>>> +
>>>> +        assertNotNull(record = parser.nextRecord());
>>>> +        assertEquals(2, record.getRecordNumber());
>>>> +        assertEquals(code.indexOf('1'), record.getCharacterPosition())
>>>> ;
>>>> +
>>>> +        assertNotNull(record = parser.nextRecord());
>>>> +        final long positionRecord3 = record.getCharacterPosition();
>>>> +        assertEquals(3, record.getRecordNumber());
>>>> +        assertEquals(code.indexOf("'A"),
>>>> record.getCharacterPosition());
>>>> +        assertEquals("A" + lineSeparator + "A", record.get(0));
>>>> +        assertEquals("B" + lineSeparator + "B", record.get(1));
>>>> +        assertEquals("CC", record.get(2));
>>>> +
>>>> +        assertNotNull(record = parser.nextRecord());
>>>> +        assertEquals(4, record.getRecordNumber());
>>>> +        assertEquals(code.indexOf('\u00c4'),
>>>> record.getCharacterPosition());
>>>> +
>>>> +        assertNotNull(record = parser.nextRecord());
>>>> +        assertEquals(5, record.getRecordNumber());
>>>> +        assertEquals(code.indexOf("EOF"),
>>>>
>>> record.getCharacterPosition());
>>>
>>>> +
>>>> +        parser.close();
>>>> +
>>>> +        // now try to read starting at record 3
>>>> +        parser = CSVParser.parse(code.substring((int)
>>>> positionRecord3),
>>>> format);
>>>> +        parser.setNextRecordNumber(3);
>>>> +        parser.setNextCharacterPosition(positionRecord3);
>>>> +
>>>> +        assertNotNull(record = parser.nextRecord());
>>>> +        assertEquals(3, record.getRecordNumber());
>>>> +        assertEquals(code.indexOf("'A"),
>>>> record.getCharacterPosition());
>>>> +        assertEquals("A" + lineSeparator + "A", record.get(0));
>>>> +        assertEquals("B" + lineSeparator + "B", record.get(1));
>>>> +        assertEquals("CC", record.get(2));
>>>> +
>>>> +        assertNotNull(record = parser.nextRecord());
>>>> +        assertEquals(4, record.getRecordNumber());
>>>> +        assertEquals(code.indexOf('\u00c4'),
>>>> record.getCharacterPosition());
>>>> +        assertEquals("\u00c4", record.get(0));
>>>> +
>>>> +        parser.close();
>>>> +    }
>>>>   }
>>>>
>>>> Modified:
>>>>
>>>>  commons/proper/csv/trunk/src/test/java/org/apache/commons/
>>> csv/CSVRecordTest.java
>>>
>>>> URL:
>>>>
>>>>  http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/
>>> test/java/org/apache/commons/csv/CSVRecordTest.java?rev=
>>> 1635052&r1=1635051&r2=1635052&view=diff
>>>
>>>>
>>>>
>>>>  ============================================================
>>> ==================
>>>
>>>> ---
>>>>
>>>>  commons/proper/csv/trunk/src/test/java/org/apache/commons/
>>> csv/CSVRecordTest.java
>>>
>>>> (original)
>>>> +++
>>>>
>>>>  commons/proper/csv/trunk/src/test/java/org/apache/commons/
>>> csv/CSVRecordTest.java
>>>
>>>> Wed Oct 29 05:44:40 2014
>>>> @@ -45,12 +45,12 @@ public class CSVRecordTest {
>>>>       @Before
>>>>       public void setUp() throws Exception {
>>>>           values = new String[] { "A", "B", "C" };
>>>> -        record = new CSVRecord(values, null, null, 0);
>>>> +        record = new CSVRecord(values, null, null, 0, -1);
>>>>           header = new HashMap<String, Integer>();
>>>>           header.put("first", Integer.valueOf(0));
>>>>           header.put("second", Integer.valueOf(1));
>>>>           header.put("third", Integer.valueOf(2));
>>>> -        recordWithHeader = new CSVRecord(values, header, null, 0);
>>>> +        recordWithHeader = new CSVRecord(values, header, null, 0, -1);
>>>>       }
>>>>
>>>>       @Test
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> http://people.apache.org/~britter/
>>> http://www.systemoutprintln.de/
>>> http://twitter.com/BenediktRitter
>>> http://github.com/britter
>>>
>>>
>>
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


-- 
E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Re: svn commit: r1635052 - in /commons/proper/csv/trunk/src: changes/ main/java/org/apache/commons/csv/ test/java/org/apache/commons/csv/

Reply via email to