Thans for the tip, I've realize about that end I've ended using explode as
you said.
This is my attempt
var res=(df.explode("rows","r") {
l: WrappedArray[ArrayBuffer[String]] => l.toList}).select("r")
.map { m => m.getList[Row](0) }
var u = res.map { m => Row.fromSeq(m.toSeq) }
var df1 = df.sqlContext.createDataFrame(u, getScheme(df) )
It woks ok, but throws an invalid cast to Integer if the scheme have some
IntegerType, looks like a spark-csv bug, but I can solved anyway
Thanks for the help.
On Thu, Jan 28, 2016 at 7:43 PM, Mohammed Guller <[email protected]>
wrote:
> You don’t need Hive for that. The DataFrame class has a method named
> explode, which provides the same functionality.
>
>
>
> Here is an example from the Spark API documentation:
>
> df.explode("words", "word"){words: String => words.split(" ")}
>
>
>
> The first argument to the explode method is the name of the input column
> and the second argument is the name of the output column.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Andrés Ivaldi [mailto:[email protected]]
> *Sent:* Wednesday, January 27, 2016 7:17 PM
> *To:* Cheng, Hao
> *Cc:* Sahil Sareen; Al Pivonka; user
>
> *Subject:* Re: JSON to SQL
>
>
>
> I'm using DataFrames reading the JSON exactly as you say, and I can get
> the scheme from there. Reading the documentation, I realized that is
> possible to create Dynamically a Structure, so applying some
> transformations to the dataFrame plus the new structure I'll be able to
> save the JSON on my DBRM.
>
>
>
> For the flatten approach, you mentioned LateralView, do I need Hive DB for
> that? or just the Spark Hive Context? I saw some examples and that is
> exactly what I'm needing. Can you explain it a little bit more?
>
>
>
> Thanks
>
>
>
> On Wed, Jan 27, 2016 at 10:29 PM, Cheng, Hao <[email protected]> wrote:
>
> Have you ever try the DataFrame API like:
> sqlContext.read.json("/path/to/file.json"); the Spark SQL will auto infer
> the type/schema for you.
>
>
>
> And lateral view will help on the flatten issues,
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView,
> as well as the “a.b[0].c” format of expression.
>
>
>
>
>
> *From:* Andrés Ivaldi [mailto:[email protected]]
> *Sent:* Thursday, January 28, 2016 3:39 AM
> *To:* Sahil Sareen
> *Cc:* Al Pivonka; user
> *Subject:* Re: JSON to SQL
>
>
>
> I'm really brand new with Scala, but if I'm defining a case class then is
> becouse I know how is the json's structure is previously?
>
> If I'm able to define dinamicaly a case class from the JSON structure then
> even with spark I will be able to extract the data
>
>
>
> On Wed, Jan 27, 2016 at 4:01 PM, Sahil Sareen <[email protected]> wrote:
>
> Isn't this just about defining a case class and using
> parse(json).extract[CaseClassName] using Jackson?
>
> -Sahil
>
>
>
> On Wed, Jan 27, 2016 at 11:08 PM, Andrés Ivaldi <[email protected]>
> wrote:
>
> We dont have Domain Objects, its a service like a pipeline, data is read
> from source and they are saved it in relational Database
>
> I can read the structure from DataFrames, and do some transformations, I
> would prefer to do it with Spark to be consistent with the process
>
>
>
> On Wed, Jan 27, 2016 at 12:25 PM, Al Pivonka <[email protected]> wrote:
>
> Are you using an Relational Database?
>
> If so why not use a nojs DB ? then pull from it to your relational?
>
>
>
> Or utilize a library that understands Json structure like Jackson to
> obtain the data from the Json structure the persist the Domain Objects ?
>
>
>
> On Wed, Jan 27, 2016 at 9:45 AM, Andrés Ivaldi <[email protected]> wrote:
>
> Sure,
>
> The Job is like an etl, but without interface, so I decide the rules of
> how the JSON will be saved into a SQL Table.
>
>
>
> I need to Flatten the hierarchies where is possible in case of list
> flatten also, nested objects Won't be processed by now
>
> {"a":1,"b":[2,3],"c"="Field", "d":[4,5,6,7,8] }
> {"a":11,"b":[22,33],"c"="Field1", "d":[44,55,66,77,88] }
> {"a":111,"b":[222,333],"c"="Field2", "d":[44,55,666,777,888] }
>
> I would like something like this on my SQL table
>
> a b c d
>
> 1 2,3 Field 4,5,6,7,8
>
> 11 22,33 Field1 44,55,66,77,88
>
> 111 222,333 Field2 444,555,,666,777,888
>
> Right now this is what i need
>
> I will later add more intelligence, like detection of list or nested
> objects and create relations in other tables.
>
>
>
>
>
>
>
> On Wed, Jan 27, 2016 at 11:25 AM, Al Pivonka <[email protected]> wrote:
>
> More detail is needed.
>
> Can you provide some context to the use-case ?
>
>
>
> On Wed, Jan 27, 2016 at 8:33 AM, Andrés Ivaldi <[email protected]> wrote:
>
> Hello, I'm trying to Save a JSON filo into SQL table.
>
> If i try to do this directly the IlligalArgumentException is raised, I
> suppose this is beacouse JSON have a hierarchical structure, is that
> correct?
>
> If that is the problem, how can I flatten the JSON structure? The JSON
> structure to be processed would be unknow, so I need to do it
> programatically
>
> regards
>
> --
>
> Ing. Ivaldi Andres
>
>
>
>
>
> --
>
> Those who say it can't be done, are usually interrupted by those doing it.
>
>
>
> --
>
> Ing. Ivaldi Andres
>
>
>
>
>
> --
>
> Those who say it can't be done, are usually interrupted by those doing it.
>
>
>
> --
>
> Ing. Ivaldi Andres
>
>
>
>
>
>
> --
>
> Ing. Ivaldi Andres
>
>
>
>
>
> --
>
> Ing. Ivaldi Andres
>
--
Ing. Ivaldi Andres