Hi Philip, the CsvInputFormat does not support to read empty fields.
I see two ways to achieve this functionality: - Use a TextInputFormat that returns each line as a String and do the parsing in a subsequent MapFunction - Extend the CsvInputFormat to support empty fields Cheers, Fabian 2015-10-26 10:43 GMT+01:00 Philip Lee <philjj...@gmail.com>: > Thanks for your reply. > > What if I do not use Table API? > The error happens when using just env.readFromCsvFile(). > > I heard that using RowSerializer would handle this null value, but its > error of TypeInformation happens when it is converted > > On Mon, Oct 26, 2015 at 10:26 AM, Maximilian Michels <m...@apache.org> > wrote: > >> As far as I know the null support was removed from the Table API because >> its support was consistently supported with all operations. See >> https://issues.apache.org/jira/browse/FLINK-2236 >> >> On Fri, Oct 23, 2015 at 7:18 PM, Shiti Saxena <ssaxena....@gmail.com> >> wrote: >> >>> For a similar problem where we wanted to preserve and track null >>> entries, we load the CSV as a DataSet[Array[Object]] and then transform it >>> into DataSet[Row] using a custom RowSerializer( >>> https://gist.github.com/Shiti/d0572c089cc08654019c) which handles null. >>> >>> >>> The Table API(which supports null) can then be used on the resulting >>> DataSet[Row]. >>> >>> >>> On Fri, Oct 23, 2015 at 7:38 PM, Maximilian Michels <m...@apache.org> >>> wrote: >>> >>>> Hi Philip, >>>> >>>> How about making the empty field of type String? Then you can read the >>>> CSV into a DataSet and treat the empty string as a null value. Not very >>>> nice but a workaround. As of now, Flink deliberately doesn't support null >>>> values. >>>> >>>> Regards, >>>> Max >>>> >>>> >>>> On Thu, Oct 22, 2015 at 4:30 PM, Philip Lee <philjj...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to load the dataset with the part of null value by using >>>>> readCsvFile(). >>>>> >>>>> // e.g _date|_click|_sales|_item|_web_page|_user >>>>> >>>>> case class WebClick(_click_date: Long, _click_time: Long, _sales: Int, >>>>> _item: Int,_page: Int, _user: Int) >>>>> >>>>> private def getWebClickDataSet(env: ExecutionEnvironment): >>>>> DataSet[WebClick] = { >>>>> >>>>> env.readCsvFile[WebClick]( >>>>> webClickPath, >>>>> fieldDelimiter = "|", >>>>> includedFields = Array(0, 1, 2, 3, 4, 5), >>>>> // lenient = true >>>>> ) >>>>> } >>>>> >>>>> >>>>> Well, I know there is an option to ignore malformed value, but I have >>>>> to read the dataset even though it has null value. >>>>> >>>>> as it follows, dataset (third column is null) looks like >>>>> 37794|24669||16705|23|54810 >>>>> but I have to read null value as well because I have to use filter or >>>>> where function ( _sales == null ) >>>>> >>>>> Is there any detail suggestion to do it? >>>>> >>>>> Thanks, >>>>> Philip >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> ========================================================== >>>>> >>>>> *Hae Joon Lee* >>>>> >>>>> >>>>> Now, in Germany, >>>>> >>>>> M.S. Candidate, Interested in Distributed System, Iterative Processing >>>>> >>>>> Dept. of Computer Science, Informatik in German, TUB >>>>> >>>>> Technical University of Berlin >>>>> >>>>> >>>>> In Korea, >>>>> >>>>> M.S. Candidate, Computer Architecture Laboratory >>>>> >>>>> Dept. of Computer Science, KAIST >>>>> >>>>> >>>>> Rm# 4414 CS Dept. KAIST >>>>> >>>>> 373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701) >>>>> >>>>> >>>>> Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea >>>>> >>>>> ========================================================== >>>>> >>>> >>>> >>> >> > > > -- > > ========================================================== > > *Hae Joon Lee* > > > Now, in Germany, > > M.S. Candidate, Interested in Distributed System, Iterative Processing > > Dept. of Computer Science, Informatik in German, TUB > > Technical University of Berlin > > > In Korea, > > M.S. Candidate, Computer Architecture Laboratory > > Dept. of Computer Science, KAIST > > > Rm# 4414 CS Dept. KAIST > > 373-1 Guseong-dong, Yuseong-gu, Daejon, South Korea (305-701) > > > Mobile) 49) 015-251-448-278 in Germany, no cellular in Korea > > ========================================================== >