I just tried it and it still cannot parse it. It still takes the input as a dataset object rather than a string.
On Wed, Apr 27, 2016 at 12:36 PM, Punit Naik <naik.puni...@gmail.com> wrote: > Okay Thanks a lot Fabian! > > On Wed, Apr 27, 2016 at 12:34 PM, Fabian Hueske <fhue...@gmail.com> wrote: > >> You should do the parsing in a Map operator. Map applies the MapFunction >> to >> each element in the DataSet. >> So you can either implement another MapFunction or extend the one you have >> to call the JSON parser. >> >> 2016-04-27 6:40 GMT+02:00 Punit Naik <naik.puni...@gmail.com>: >> >> > Hi >> > >> > So I managed to do the map part. I stuc with the "import >> > scala.util.parsing.json._" library for parsing. >> > >> > First I read my JSON: >> > >> > val data=env.readTextFile("file:///home/punit/vik-in") >> > >> > Then I transformed it so that it can be parsed to a map: >> > >> > val j=data.map { x => ("\"\"\"").+(x).+("\"\"\"") } >> > >> > >> > I check it by printing "j"s 1st value and its proper. >> > >> > But when I tried to parse "j" like this: >> > >> > JSON.parseFull(j.first(1)) ; it did not parse because the object >> > "j.first(1)" is still a Dataset object and not a String object. >> > >> > So how can I get the underlying java object from the dataset object? >> > >> > On Tue, Apr 26, 2016 at 3:32 PM, Fabian Hueske <fhue...@gmail.com> >> wrote: >> > >> > > Hi, >> > > >> > > you need to implement the MapFunction interface [1]. >> > > Inside the MapFunction you can use any JSON parser library such as >> > Jackson >> > > to parse the String. >> > > The exact logic depends on your use case. >> > > >> > > However, you should be careful to not initialize a new parser in each >> > map() >> > > call, because that would be quite expensive. >> > > I recommend to extend the RichMapFunction and instantiate a parser in >> the >> > > open() method. >> > > >> > > Best, Fabian >> > > >> > > [1] >> > > >> > > >> > >> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/dataset_transformations.html#map >> > > [2] >> > > >> > > >> > >> https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/common/index.html#specifying-transformation-functions >> > > >> > > 2016-04-26 10:44 GMT+02:00 Punit Naik <naik.puni...@gmail.com>: >> > > >> > > > Hi Fabian >> > > > >> > > > Thanks for the reply. Yes my json is separated by new lines. It >> would >> > > have >> > > > been great if you had explained the function that goes inside the >> map. >> > I >> > > > tried to use the 'scala.util.parsing.json._' library but got no >> luck. >> > > > >> > > > On Tue, Apr 26, 2016 at 1:11 PM, Fabian Hueske <fhue...@gmail.com> >> > > wrote: >> > > > >> > > > > Hi Punit, >> > > > > >> > > > > JSON can be hard to parse in parallel due to its nested >> structure. It >> > > > > depends on the schema and (textual) representation of the JSON >> > whether >> > > > and >> > > > > how it can be done. The problem is that a parallel input format >> needs >> > > to >> > > > be >> > > > > able to identify record boundaries without context information. >> This >> > > can >> > > > be >> > > > > very easy, if your JSON data is a list of JSON objects which are >> > > > separated >> > > > > by a new line character. However, this is hard to generalize. >> That's >> > > why >> > > > > Flink does not offer tooling for it (yet). >> > > > > >> > > > > If your JSON objects are separated by new line characters, the >> > easiest >> > > > way >> > > > > is to read it as text file, where each line results in a String >> and >> > > parse >> > > > > each object using a standard JSON parser. This would look like: >> > > > > >> > > > > ExecutionEnvironment env = >> > > > ExecutionEnvironment.getExecutionEnvironment(); >> > > > > >> > > > > DataSet<String> text = env.readTextFile("/path/to/jsonfile"); >> > > > > DataSet<YourObject> json = text.map(new >> > > > YourMapFunctionWhichParsesJSON()); >> > > > > >> > > > > Best, Fabian >> > > > > >> > > > > 2016-04-26 8:06 GMT+02:00 Punit Naik <naik.puni...@gmail.com>: >> > > > > >> > > > > > Hi >> > > > > > >> > > > > > I am new to Flink. I was experimenting with the Dataset API and >> > found >> > > > out >> > > > > > that there is no explicit method for loading a JSON file as >> input. >> > > Can >> > > > > > anyone please suggest me a workaround? >> > > > > > >> > > > > > -- >> > > > > > Thank You >> > > > > > >> > > > > > Regards >> > > > > > >> > > > > > Punit Naik >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Thank You >> > > > >> > > > Regards >> > > > >> > > > Punit Naik >> > > > >> > > >> > >> > >> > >> > -- >> > Thank You >> > >> > Regards >> > >> > Punit Naik >> > >> > > > > -- > Thank You > > Regards > > Punit Naik > -- Thank You Regards Punit Naik