This should be something simple to use, it's just a parser for a 'simple' file format. I get a format, a parser and my data. If each new feature is a decorator I'm going to end up creating 5 classes to use 5 features, no thanks. Like you said, it could be a matter of personal taste. Over at HttpComponents HttpClient, the decorators have been deprecated in favor of builders. Here at CSV, we have a builder for CSV formats. So maybe that should be used, at least it would be consistent. Also, until someone tries to create a decorator, it's just talk, and from one of the comments, it might not be doable...
Gary On Wed, Oct 29, 2014 at 12:39 PM, Adrian Crum < adrian.c...@sandglass-software.com> wrote: > Generally speaking, good designs are not based on class count. > > If we simply add methods to the parser every time a user comes up with a > new use case, then eventually we will end up with a single class that tries > to be all things to all people. > > The decorator pattern is ideally suited for situations like this: > > 1. I need a basic CSV parser: Use the basic parser. > 2. I need a CSV parser that supports random access: Use basic parser > decorated by Random Access decorator. > 3. I need a CSV parser that supports foo: Use basic parser decorated by > Foo decorator. > 4. I need... > > From my perspective, that makes client code cleaner and simpler, not more > complicated. > > But like many design discussions, that is a personal preference and others > may feel differently. > > Adrian Crum > Sandglass Software > www.sandglass-software.com > > On 10/29/2014 4:08 PM, Gary Gregory wrote: > >> On Wed, Oct 29, 2014 at 4:13 AM, Benedikt Ritter <brit...@apache.org> >> wrote: >> >> Hi Gary, >>> >>> what do you think about the Decorator approach, suggested by Adrian Crum >>> in >>> JIRA? >>> >>> >> In theory, it's OK, but in this specific case, it does not seem to me >> worth >> the cost (an extra class) and complexity (it makes the client code more >> complicated). It is also not clear that it would be easy to do based on >> the >> other comments in the JIRA. I do not plan on researching this path but I >> suppose I would examine a patch. >> >> Gary >> >> >> >>> Benedikt >>> >>> 2014-10-29 6:44 GMT+01:00 <ggreg...@apache.org>: >>> >>> Author: ggregory >>>> Date: Wed Oct 29 05:44:40 2014 >>>> New Revision: 1635052 >>>> >>>> URL: http://svn.apache.org/r1635052 >>>> Log: >>>> [CSV-131] Save positions of records to enable random access. The floor >>>> is >>>> open for code review and further discussion based on the comments in the >>>> Jira. >>>> >>>> Modified: >>>> commons/proper/csv/trunk/src/changes/changes.xml >>>> >>>> >>>> commons/proper/csv/trunk/src/main/java/org/apache/commons/ >>> csv/CSVParser.java >>> >>>> >>>> >>>> commons/proper/csv/trunk/src/main/java/org/apache/commons/ >>> csv/CSVRecord.java >>> >>>> >>>> >>>> commons/proper/csv/trunk/src/test/java/org/apache/commons/ >>> csv/CSVParserTest.java >>> >>>> >>>> >>>> commons/proper/csv/trunk/src/test/java/org/apache/commons/ >>> csv/CSVRecordTest.java >>> >>>> >>>> Modified: commons/proper/csv/trunk/src/changes/changes.xml >>>> URL: >>>> >>>> http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/ >>> changes/changes.xml?rev=1635052&r1=1635051&r2=1635052&view=diff >>> >>>> >>>> >>>> ============================================================ >>> ================== >>> >>>> --- commons/proper/csv/trunk/src/changes/changes.xml (original) >>>> +++ commons/proper/csv/trunk/src/changes/changes.xml Wed Oct 29 >>>> 05:44:40 >>>> 2014 >>>> @@ -39,6 +39,7 @@ >>>> </properties> >>>> <body> >>>> <release version="1.1" date="2014-mm-dd" description="Feature and >>>> >>> bug >>> >>>> fix release"> >>>> + <action issue="CSV-131" type="add" dev="ggregory" due-to="Holger >>>> Stratmann">Save positions of records to enable random access</action> >>>> <action issue="CSV-130" type="fix" dev="ggregory" due-to="Sergei >>>> Lebedev">CSVFormat#withHeader doesn't work well with #printComment, add >>>> withHeaderComments(String...)</action> >>>> <action issue="CSV-128" type="fix" >>>> dev="ggregory">CSVFormat.EXCEL >>>> should ignore empty header names</action> >>>> <action issue="CSV-129" type="add" dev="ggregory">Add >>>> CSVFormat#with 0-arg methods matching boolean arg methods</action> >>>> >>>> Modified: >>>> >>>> commons/proper/csv/trunk/src/main/java/org/apache/commons/ >>> csv/CSVParser.java >>> >>>> URL: >>>> >>>> http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/ >>> main/java/org/apache/commons/csv/CSVParser.java?rev= >>> 1635052&r1=1635051&r2=1635052&view=diff >>> >>>> >>>> >>>> ============================================================ >>> ================== >>> >>>> --- >>>> >>>> commons/proper/csv/trunk/src/main/java/org/apache/commons/ >>> csv/CSVParser.java >>> >>>> (original) >>>> +++ >>>> >>>> commons/proper/csv/trunk/src/main/java/org/apache/commons/ >>> csv/CSVParser.java >>> >>>> Wed Oct 29 05:44:40 2014 >>>> @@ -219,6 +219,12 @@ public final class CSVParser implements >>>> private final List<String> record = new ArrayList<String>(); >>>> >>>> private long recordNumber; >>>> + >>>> + /** >>>> + * Lexer offset if the parser does not start parsing at the >>>> >>> beginning >>> >>>> of the source. Usually used in combination >>>> + * with {@link #setNextRecordNumber(long)} >>>> + */ >>>> + private long characterOffset; >>>> >>>> private final Token reusableToken = new Token(); >>>> >>>> @@ -296,6 +302,43 @@ public final class CSVParser implements >>>> } >>>> >>>> /** >>>> + * Sets the record number to be assigned to the next record read. >>>> + * <p> >>>> + * Use this if the reader is not positioned at the first record >>>> when >>>> you create the parser. For example, the first >>>> + * record read might be the 51st record in the source file. >>>> + * </p> >>>> + * <p> >>>> + * If you want the records to also have the correct character >>>> position referring to the underlying source, call >>>> + * {@link #setNextCharacterPosition(long)}. >>>> + * </p> >>>> + * >>>> + * @param nextRecordNumber >>>> + * the next record number >>>> + * @since 1.1 >>>> + */ >>>> + public void setNextRecordNumber(long nextRecordNumber) { >>>> + this.recordNumber = nextRecordNumber - 1; >>>> + } >>>> + >>>> + /** >>>> + * Sets the current position in the source stream regardless of >>>> >>> where >>> >>>> the parser and lexer start reading. >>>> + * <p> >>>> + * For example: We open a file and seek to position 5434 in order >>>> to >>>> start reading at record 42. In order to have >>>> + * the parser assign the correct characterPosition to records, we >>>> call this method. >>>> + * </p> >>>> + * <p> >>>> + * If you want the records to also have the correct record numbers, >>>> call {@link #setNextRecordNumber(long)} >>>> + * </p> >>>> + * >>>> + * @param position >>>> + * the new character position >>>> + * @since 1.1 >>>> + */ >>>> + public void setNextCharacterPosition(long position) { >>>> + this.characterOffset = position - lexer.getCharacterPosition(); >>>> + } >>>> + >>>> + /** >>>> * Returns the current record number in the input stream. >>>> * >>>> * <p> >>>> @@ -445,6 +488,7 @@ public final class CSVParser implements >>>> CSVRecord result = null; >>>> this.record.clear(); >>>> StringBuilder sb = null; >>>> + final long startCharPosition = lexer.getCharacterPosition() + >>>> this.characterOffset; >>>> do { >>>> this.reusableToken.reset(); >>>> this.lexer.nextToken(this.reusableToken); >>>> @@ -480,7 +524,7 @@ public final class CSVParser implements >>>> this.recordNumber++; >>>> final String comment = sb == null ? null : sb.toString(); >>>> result = new CSVRecord(this.record.toArray(new >>>> String[this.record.size()]), this.headerMap, comment, >>>> - this.recordNumber); >>>> + this.recordNumber, startCharPosition); >>>> } >>>> return result; >>>> } >>>> >>>> Modified: >>>> >>>> commons/proper/csv/trunk/src/main/java/org/apache/commons/ >>> csv/CSVRecord.java >>> >>>> URL: >>>> >>>> http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/ >>> main/java/org/apache/commons/csv/CSVRecord.java?rev= >>> 1635052&r1=1635051&r2=1635052&view=diff >>> >>>> >>>> >>>> ============================================================ >>> ================== >>> >>>> --- >>>> >>>> commons/proper/csv/trunk/src/main/java/org/apache/commons/ >>> csv/CSVRecord.java >>> >>>> (original) >>>> +++ >>>> >>>> commons/proper/csv/trunk/src/main/java/org/apache/commons/ >>> csv/CSVRecord.java >>> >>>> Wed Oct 29 05:44:40 2014 >>>> @@ -36,6 +36,8 @@ public final class CSVRecord implements >>>> >>>> private static final long serialVersionUID = 1L; >>>> >>>> + private final long characterPosition; >>>> + >>>> /** The accumulated comments (if any) */ >>>> private final String comment; >>>> >>>> @@ -44,15 +46,16 @@ public final class CSVRecord implements >>>> >>>> /** The record number. */ >>>> private final long recordNumber; >>>> - >>>> + >>>> /** The values of the record */ >>>> private final String[] values; >>>> >>>> - CSVRecord(final String[] values, final Map<String, Integer> >>>> mapping, >>>> final String comment, final long recordNumber) { >>>> + CSVRecord(final String[] values, final Map<String, Integer> >>>> mapping, >>>> final String comment, final long recordNumber, long characterPosition) { >>>> this.recordNumber = recordNumber; >>>> this.values = values != null ? values : EMPTY_STRING_ARRAY; >>>> this.mapping = mapping; >>>> this.comment = comment; >>>> + this.characterPosition = characterPosition; >>>> } >>>> >>>> /** >>>> @@ -110,6 +113,16 @@ public final class CSVRecord implements >>>> } >>>> >>>> /** >>>> + * Returns the start position of this record as a character >>>> position >>>> in the source stream. This may or may not >>>> + * correspond to the byte position depending on the character set. >>>> + * >>>> + * @return the position of this record in the source stream. >>>> + */ >>>> + public long getCharacterPosition() { >>>> + return characterPosition; >>>> + } >>>> + >>>> + /** >>>> * Returns the comment for this record, if any. >>>> * >>>> * @return the comment for this record, or null if no comment for >>>> this record is available. >>>> >>>> Modified: >>>> >>>> commons/proper/csv/trunk/src/test/java/org/apache/commons/ >>> csv/CSVParserTest.java >>> >>>> URL: >>>> >>>> http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/ >>> test/java/org/apache/commons/csv/CSVParserTest.java?rev= >>> 1635052&r1=1635051&r2=1635052&view=diff >>> >>>> >>>> >>>> ============================================================ >>> ================== >>> >>>> --- >>>> >>>> commons/proper/csv/trunk/src/test/java/org/apache/commons/ >>> csv/CSVParserTest.java >>> >>>> (original) >>>> +++ >>>> >>>> commons/proper/csv/trunk/src/test/java/org/apache/commons/ >>> csv/CSVParserTest.java >>> >>>> Wed Oct 29 05:44:40 2014 >>>> @@ -299,22 +299,23 @@ public class CSVParserTest { >>>> } >>>> } >>>> >>>> -// @Test >>>> -// public void testStartWithEmptyLinesThenHeaders() throws >>>> Exception >>>> >>> { >>> >>>> -// final String[] codes = { "\r\n\r\n\r\nhello,\r\n\r\n\r\n", >>>> "hello,\n\n\n", "hello,\"\"\r\n\r\n\r\n", "hello,\"\"\n\n\n" }; >>>> -// final String[][] res = { { "hello", "" }, { "" }, // Excel >>>> format does not ignore empty lines >>>> -// { "" } }; >>>> -// for (final String code : codes) { >>>> -// final CSVParser parser = CSVParser.parse(code, >>>> CSVFormat.EXCEL); >>>> -// final List<CSVRecord> records = parser.getRecords(); >>>> -// assertEquals(res.length, records.size()); >>>> -// assertTrue(records.size() > 0); >>>> -// for (int i = 0; i < res.length; i++) { >>>> -// assertArrayEquals(res[i], records.get(i).values()); >>>> -// } >>>> -// parser.close(); >>>> -// } >>>> -// } >>>> + // @Test >>>> + // public void testStartWithEmptyLinesThenHeaders() throws >>>> >>> Exception { >>> >>>> + // final String[] codes = { "\r\n\r\n\r\nhello,\r\n\r\n\r\n", >>>> "hello,\n\n\n", "hello,\"\"\r\n\r\n\r\n", >>>> + // "hello,\"\"\n\n\n" }; >>>> + // final String[][] res = { { "hello", "" }, { "" }, // Excel >>>> format >>>> does not ignore empty lines >>>> + // { "" } }; >>>> + // for (final String code : codes) { >>>> + // final CSVParser parser = CSVParser.parse(code, CSVFormat.EXCEL); >>>> + // final List<CSVRecord> records = parser.getRecords(); >>>> + // assertEquals(res.length, records.size()); >>>> + // assertTrue(records.size() > 0); >>>> + // for (int i = 0; i < res.length; i++) { >>>> + // assertArrayEquals(res[i], records.get(i).values()); >>>> + // } >>>> + // parser.close(); >>>> + // } >>>> + // } >>>> >>>> @Test >>>> public void testEndOfFileBehaviorCSV() throws Exception { >>>> @@ -475,6 +476,16 @@ public class CSVParserTest { >>>> } >>>> >>>> @Test >>>> + public void testGetRecordPositionWithCRLF() throws Exception { >>>> + this.validateRecordPosition(CRLF); >>>> + } >>>> + >>>> + @Test >>>> + public void testGetRecordPositionWithLF() throws Exception { >>>> + this.validateRecordPosition(String.valueOf(LF)); >>>> + } >>>> + >>>> + @Test >>>> public void testGetOneLine() throws IOException { >>>> final CSVParser parser = CSVParser.parse(CSV_INPUT_1, >>>> CSVFormat.DEFAULT); >>>> final CSVRecord record = parser.getRecords().get(0); >>>> @@ -902,4 +913,65 @@ public class CSVParserTest { >>>> parser.close(); >>>> } >>>> >>>> + private void validateRecordPosition(final String lineSeparator) >>>> throws IOException { >>>> + final String nl = lineSeparator; // used as linebreak in values >>>> for better distinction >>>> + >>>> + String code = "a,b,c" + lineSeparator + "1,2,3" + lineSeparator >>>> >>> + >>> >>>> + // to see if recordPosition correctly points to the enclosing >>>> quote >>>> + "'A" + nl + "A','B" + nl + "B',CC" + lineSeparator + >>>> + // unicode test... not very relevant while operating on >>>> strings instead of bytes, but for >>>> + // completeness... >>>> + "\u00c4,\u00d6,\u00dc" + lineSeparator + "EOF,EOF,EOF"; >>>> + >>>> + final CSVFormat format = >>>> >>>> CSVFormat.newFormat(',').withQuote('\'').withRecordSeparator( >>> lineSeparator); >>> >>>> + CSVParser parser = CSVParser.parse(code, format); >>>> + >>>> + CSVRecord record; >>>> + assertEquals(0, parser.getRecordNumber()); >>>> + >>>> + assertNotNull(record = parser.nextRecord()); >>>> + assertEquals(1, record.getRecordNumber()); >>>> + assertEquals(code.indexOf('a'), record.getCharacterPosition()) >>>> ; >>>> + >>>> + assertNotNull(record = parser.nextRecord()); >>>> + assertEquals(2, record.getRecordNumber()); >>>> + assertEquals(code.indexOf('1'), record.getCharacterPosition()) >>>> ; >>>> + >>>> + assertNotNull(record = parser.nextRecord()); >>>> + final long positionRecord3 = record.getCharacterPosition(); >>>> + assertEquals(3, record.getRecordNumber()); >>>> + assertEquals(code.indexOf("'A"), >>>> record.getCharacterPosition()); >>>> + assertEquals("A" + lineSeparator + "A", record.get(0)); >>>> + assertEquals("B" + lineSeparator + "B", record.get(1)); >>>> + assertEquals("CC", record.get(2)); >>>> + >>>> + assertNotNull(record = parser.nextRecord()); >>>> + assertEquals(4, record.getRecordNumber()); >>>> + assertEquals(code.indexOf('\u00c4'), >>>> record.getCharacterPosition()); >>>> + >>>> + assertNotNull(record = parser.nextRecord()); >>>> + assertEquals(5, record.getRecordNumber()); >>>> + assertEquals(code.indexOf("EOF"), >>>> >>> record.getCharacterPosition()); >>> >>>> + >>>> + parser.close(); >>>> + >>>> + // now try to read starting at record 3 >>>> + parser = CSVParser.parse(code.substring((int) >>>> positionRecord3), >>>> format); >>>> + parser.setNextRecordNumber(3); >>>> + parser.setNextCharacterPosition(positionRecord3); >>>> + >>>> + assertNotNull(record = parser.nextRecord()); >>>> + assertEquals(3, record.getRecordNumber()); >>>> + assertEquals(code.indexOf("'A"), >>>> record.getCharacterPosition()); >>>> + assertEquals("A" + lineSeparator + "A", record.get(0)); >>>> + assertEquals("B" + lineSeparator + "B", record.get(1)); >>>> + assertEquals("CC", record.get(2)); >>>> + >>>> + assertNotNull(record = parser.nextRecord()); >>>> + assertEquals(4, record.getRecordNumber()); >>>> + assertEquals(code.indexOf('\u00c4'), >>>> record.getCharacterPosition()); >>>> + assertEquals("\u00c4", record.get(0)); >>>> + >>>> + parser.close(); >>>> + } >>>> } >>>> >>>> Modified: >>>> >>>> commons/proper/csv/trunk/src/test/java/org/apache/commons/ >>> csv/CSVRecordTest.java >>> >>>> URL: >>>> >>>> http://svn.apache.org/viewvc/commons/proper/csv/trunk/src/ >>> test/java/org/apache/commons/csv/CSVRecordTest.java?rev= >>> 1635052&r1=1635051&r2=1635052&view=diff >>> >>>> >>>> >>>> ============================================================ >>> ================== >>> >>>> --- >>>> >>>> commons/proper/csv/trunk/src/test/java/org/apache/commons/ >>> csv/CSVRecordTest.java >>> >>>> (original) >>>> +++ >>>> >>>> commons/proper/csv/trunk/src/test/java/org/apache/commons/ >>> csv/CSVRecordTest.java >>> >>>> Wed Oct 29 05:44:40 2014 >>>> @@ -45,12 +45,12 @@ public class CSVRecordTest { >>>> @Before >>>> public void setUp() throws Exception { >>>> values = new String[] { "A", "B", "C" }; >>>> - record = new CSVRecord(values, null, null, 0); >>>> + record = new CSVRecord(values, null, null, 0, -1); >>>> header = new HashMap<String, Integer>(); >>>> header.put("first", Integer.valueOf(0)); >>>> header.put("second", Integer.valueOf(1)); >>>> header.put("third", Integer.valueOf(2)); >>>> - recordWithHeader = new CSVRecord(values, header, null, 0); >>>> + recordWithHeader = new CSVRecord(values, header, null, 0, -1); >>>> } >>>> >>>> @Test >>>> >>>> >>>> >>>> >>> >>> -- >>> http://people.apache.org/~britter/ >>> http://www.systemoutprintln.de/ >>> http://twitter.com/BenediktRitter >>> http://github.com/britter >>> >>> >> >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition <http://www.manning.com/bauer3/> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> Spring Batch in Action <http://www.manning.com/templier/> Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory