[ https://issues.apache.org/jira/browse/FLINK-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959022#comment-14959022 ]
Chesnay Schepler commented on FLINK-2692: ----------------------------------------- is there any reason that prevents the scala api from using the CsvInputFormat class? they only differ in the createTuple method: Java: {code} @Override protected OUT createTuple(OUT reuse) { Tuple result = (Tuple) reuse; for (int i = 0; i < parsedValues.length; i++) { result.setField(parsedValues[i], i); } return reuse; } {code} Scala: {code} @Override protected OUT createTuple(OUT reuse) { Preconditions.checkNotNull(tupleSerializer, "The tuple serializer must be initialised." + " It is not initialized if the given type was not a " + TupleTypeInfoBase.class.getName() + "."); return tupleSerializer.createInstance(parsedValues); } {code} > Untangle CsvInputFormat into PojoTypeCsvInputFormat and > TupleTypeCsvInputFormat > -------------------------------------------------------------------------------- > > Key: FLINK-2692 > URL: https://issues.apache.org/jira/browse/FLINK-2692 > Project: Flink > Issue Type: Improvement > Reporter: Till Rohrmann > Assignee: Chesnay Schepler > Priority: Minor > > The {{CsvInputFormat}} currently allows to return values as a {{Tuple}} or a > {{Pojo}} type. As a consequence, the processing logic, which has to work for > both types, is overly complex. For example, the {{CsvInputFormat}} contains > fields which are only used when a Pojo is returned. Moreover, the pojo field > information are constructed by calling setter methods which have to be called > in a very specific order, otherwise they fail. E.g. one first has to call > {{setFieldTypes}} before calling {{setOrderOfPOJOFields}}, otherwise the > number of fields might be different. Furthermore, some of the methods can > only be called if the return type is a {{Pojo}} type, because they expect > that a {{PojoTypeInfo}} is present. > I think the {{CsvInputFormat}} should be refactored to make the code more > easily maintainable. I propose to split it up into a > {{PojoTypeCsvInputFormat}} and a {{TupleTypeCsvInputFormat}} which take all > the required information via their constructors instead of using the > {{setFields}} and {{setOrderOfPOJOFields}} approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332)