That's a great link Michael, thanks! For us it was around attempting to provide for dynamic schemas which is a bit of an anti-pattern.
Ultimately it just comes down to owning your transforms, all the basic tools are there. On 18 July 2017 at 11:03, Michael Armbrust <mich...@databricks.com> wrote: > Here is an overview of how to work with complex JSON in Spark: > https://databricks.com/blog/2017/02/23/working-complex-data-formats- > structured-streaming-apache-spark-2-1.html (works in streaming and batch) > > On Tue, Jul 18, 2017 at 10:29 AM, Riccardo Ferrari <ferra...@gmail.com> > wrote: > >> What's against: >> >> df.rdd.map(...) >> >> or >> >> dataset.foreach() >> >> https://spark.apache.org/docs/2.0.1/api/scala/index.html#org >> .apache.spark.sql.Dataset@foreach(f:T=>Unit):Unit >> >> Best, >> >> On Tue, Jul 18, 2017 at 6:46 PM, lucas.g...@gmail.com < >> lucas.g...@gmail.com> wrote: >> >>> I've been wondering about this for awhile. >>> >>> We wanted to do something similar for generically saving thousands of >>> individual homogenous events into well formed parquet. >>> >>> Ultimately I couldn't find something I wanted to own and pushed back on >>> the requirements. >>> >>> It seems the canonical answer is that you need to 'own' the schema of >>> the json and parse it out manually and into your dataframe. There's >>> nothing challenging about it. Just verbose code. If you're 'info' is a >>> consistent schema then you'll be fine. For us it was 12 wildly diverging >>> schemas and I didn't want to own the transforms. >>> >>> I also recommend persisting anything that isn't part of your schema in >>> an 'extras field' So when you parse out your json, if you've got anything >>> leftover drop it in there for later analysis. >>> >>> I can provide some sample code but I think it's pretty straightforward / >>> you can google it. >>> >>> What you can't seem to do efficiently is dynamically generate a >>> dataframe from random JSON. >>> >>> >>> On 18 July 2017 at 01:57, Chetan Khatri <chetan.opensou...@gmail.com> >>> wrote: >>> >>>> Implicit tried - didn't worked! >>>> >>>> from_json - didnt support spark 2.0.1 any alternate solution would be >>>> welcome please >>>> >>>> >>>> On Tue, Jul 18, 2017 at 12:18 PM, Georg Heiler < >>>> georg.kf.hei...@gmail.com> wrote: >>>> >>>>> You need to have spark implicits in scope >>>>> Richard Xin <richardxin...@yahoo.com.invalid> schrieb am Di. 18. Juli >>>>> 2017 um 08:45: >>>>> >>>>>> I believe you could use JOLT (bazaarvoice/jolt >>>>>> <https://github.com/bazaarvoice/jolt>) to flatten it to a json >>>>>> string and then to dataframe or dataset. >>>>>> >>>>>> bazaarvoice/jolt >>>>>> >>>>>> jolt - JSON to JSON transformation library written in Java. >>>>>> <https://github.com/bazaarvoice/jolt> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Monday, July 17, 2017, 11:18:24 PM PDT, Chetan Khatri < >>>>>> chetan.opensou...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> Explode is not working in this scenario with error - string cannot be >>>>>> used in explore either array or map in spark >>>>>> On Tue, Jul 18, 2017 at 11:39 AM, 刘虓 <ipf...@gmail.com> wrote: >>>>>> >>>>>> Hi, >>>>>> have you tried to use explode? >>>>>> >>>>>> Chetan Khatri <chetan.opensou...@gmail.com> 于2017年7月18日 周二下午2:06写道: >>>>>> >>>>>> Hello Spark Dev's, >>>>>> >>>>>> Can you please guide me, how to flatten JSON to multiple columns in >>>>>> Spark. >>>>>> >>>>>> *Example:* >>>>>> >>>>>> Sr No Title ISBN Info >>>>>> 1 Calculus Theory 1234567890 >>>>>> >>>>>> [{"cert":[{ >>>>>> "authSbmtr":"009415da-c8cd- 418d-869e-0a19601d79fa", >>>>>> 009415da-c8cd-418d-869e- 0a19601d79fa >>>>>> "certUUID":"03ea5a1a-5530- 4fa3-8871-9d1ebac627c4", >>>>>> >>>>>> "effDt":"2016-05-06T15:04:56. 279Z", >>>>>> >>>>>> >>>>>> "fileFmt":"rjrCsv","status":" live"}], >>>>>> >>>>>> "expdCnt":"15", >>>>>> "mfgAcctNum":"531093", >>>>>> >>>>>> "oUUID":"23d07397-4fbe-4897- 8a18-b79c9f64726c", >>>>>> >>>>>> >>>>>> "pgmRole":["RETAILER"], >>>>>> "pgmUUID":"1cb5dd63-817a-45bc- a15c-5660e4accd63", >>>>>> "regUUID":"cc1bd898-657d-40dc- af5d-4bf1569a1cc4", >>>>>> "rtlrsSbmtd":["009415da-c8cd- 418d-869e-0a19601d79fa"]}] >>>>>> >>>>>> I want to get single row with 11 columns. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> >>>> >>> >> >