[jira] [Created] (FLINK-2988) Cannot load DataSet[Row] from CSV file

Johann Kovacs (JIRA) Mon, 09 Nov 2015 07:06:53 -0800

Johann Kovacs created FLINK-2988:
------------------------------------

             Summary: Cannot load DataSet[Row] from CSV file
                 Key: FLINK-2988
                 URL: https://issues.apache.org/jira/browse/FLINK-2988
             Project: Flink
          Issue Type: Improvement
          Components: DataSet API, Table API
    Affects Versions: 0.10
            Reporter: Johann Kovacs
            Priority: Minor



Tuple classes (Java/Scala both) only have arity up to 25, meaning I cannot load 
a CSV file with more than 25 columns directly as a DataSet\[TupleX\[...\]\].

An alternative to using Tuples is using the Table API's Row class, which allows 
for arbitrary-length, arbitrary-type, runtime-supplied schemata (using 
RowTypeInfo) and index-based access.

However, trying to load a CSV file as a DataSet\[Row\] yields an exception:

{code}
val env = ExecutionEnvironment.createLocalEnvironment()
val filePath = "../someCsv.csv"
val typeInfo = new RowTypeInfo(Seq(BasicTypeInfo.STRING_TYPE_INFO, 
BasicTypeInfo.INT_TYPE_INFO), Seq("word", "number"))
val source = env.readCsvFile(filePath)(ClassTag(classOf[Row]), typeInfo)
println(source.collect())
{code}
with someCsv.csv containing:
{code}
one,1
two,2
{code}
yields
{code}
Exception in thread "main" java.lang.ClassCastException: 
org.apache.flink.api.table.typeinfo.RowSerializer cannot be cast to 
org.apache.flink.api.java.typeutils.runtime.TupleSerializerBase
        at 
org.apache.flink.api.scala.operators.ScalaCsvInputFormat.<init>(ScalaCsvInputFormat.java:46)
        at 
org.apache.flink.api.scala.ExecutionEnvironment.readCsvFile(ExecutionEnvironment.scala:282)
{code}

As a user I would like to be able to load a CSV file into a DataSet\[Row\], 
preferably having a convenience method to specify the schema (RowTypeInfo), 
without having to use the "explicit implicit parameters" syntax and specifying 
the ClassTag.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-2988) Cannot load DataSet[Row] from CSV file

Reply via email to