It's not JSON, per se, but data formats like smile ( http://en.wikipedia.org/wiki/Smile_%28data_interchange_format%29) provide support for markers that can't be confused with content and also provide reasonably similar ergonomics.
— p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Mon, May 4, 2015 at 5:43 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > I was wondering if it's possible to use existing Hive SerDes for this ? > > Le lun. 4 mai 2015 à 08:36, Joe Halliwell <joe.halliw...@gmail.com> a > écrit : > > > I think Reynold’s argument shows the impossibility of the general case. > > > > But a “maximum object depth” hint could enable a new input format to do > > its job both efficiently and correctly in the common case where the input > > is an array of similarly structured objects! I’d certainly be interested > in > > an implementation along those lines. > > > > Cheers, > > Joe > > > > http://www.joehalliwell.com > > @joehalliwell > > > > > > On Mon, May 4, 2015 at 7:55 AM, Reynold Xin <r...@databricks.com> wrote: > > > >> I took a quick look at that implementation. I'm not sure if it actually > >> handles JSON correctly, because it attempts to find the first { starting > >> from a random point. However, that random point could be in the middle > of > >> a > >> string, and thus the first { might just be part of a string, rather than > >> a > >> real JSON object starting position. > >> > >> > >> On Sun, May 3, 2015 at 11:13 PM, Emre Sevinc <emre.sev...@gmail.com> > >> wrote: > >> > >> > You can check out the following library: > >> > > >> > https://github.com/alexholmes/json-mapreduce > >> > > >> > -- > >> > Emre Sevinç > >> > > >> > > >> > On Sun, May 3, 2015 at 10:04 PM, Olivier Girardot < > >> > o.girar...@lateral-thoughts.com> wrote: > >> > > >> > > Hi everyone, > >> > > Is there any way in Spark SQL to load multi-line JSON data > >> efficiently, I > >> > > think there was in the mailing list a reference to > >> > > http://pivotal-field-engineering.github.io/pmr-common/ for its > >> > > JSONInputFormat > >> > > > >> > > But it's rather inaccessible considering the dependency is not > >> available > >> > in > >> > > any public maven repo (If you know of one, I'd be glad to hear it). > >> > > > >> > > Is there any plan to address this or any public recommendation ? > >> > > (considering the documentation clearly states that > >> sqlContext.jsonFile > >> > will > >> > > not work for multi-line json(s)) > >> > > > >> > > Regards, > >> > > > >> > > Olivier. > >> > > > >> > > >> > > >> > > >> > -- > >> > Emre Sevinc > >> > > >> > > > > >