Johann Kovacs created FLINK-2988: ------------------------------------ Summary: Cannot load DataSet[Row] from CSV file Key: FLINK-2988 URL: https://issues.apache.org/jira/browse/FLINK-2988 Project: Flink Issue Type: Improvement Components: DataSet API, Table API Affects Versions: 0.10 Reporter: Johann Kovacs Priority: Minor
Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot load a CSV file with more than 25 columns directly as a DataSet\[TupleX\[...\]\]. An alternative to using Tuples is using the Table API's Row class, which allows for arbitrary-length, arbitrary-type, runtime-supplied schemata (using RowTypeInfo) and index-based access. However, trying to load a CSV file as a DataSet\[Row\] yields an exception: {code} val env = ExecutionEnvironment.createLocalEnvironment() val filePath = "../someCsv.csv" val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number")) val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo) println(source.collect()) {code} with someCsv.csv containing: {code} one,1 two,2 {code} yields {code} Exception in thread "main" java.lang.ClassCastException: org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase at org.apache.flink.api.scala.operators.ScalaCsvInputFormat.<init>(ScalaCsvInputFormat.java:46) at org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282) {code} As a user I would like to be able to load a CSV file into a DataSet\[Row\], preferably having a convenience method to specify the schema (RowTypeInfo), without having to use the "explicit implicit parameters" syntax and specifying the ClassTag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)