Issue with nested JSON parsing in to data frame

2018-04-04 Thread Ritesh Shah
Hello, I am using Apache Spark 2.2.1 with Scala. I am trying to load below JSON from Kafka and trying to extract "JOBTYPE" and "LOADID" from the nested JSON object. Need help with extraction logic. Code val workRequests = new StructType().add("after", new StructType()

Dynamic Key JSON Parsing

2018-03-18 Thread Mahender Sarangam
Hi, I'm new to Spark and Scala, need help on transforming Nested JSON using Scala. We have upstream returning JSON like { "id": 100, "text": "Hello, world." Users : [ "User1": { "name": "Brett", "id": 200, "Type" : "Employee" "empid":"2" }, "Use

Re: Json Parsing.

2017-12-06 Thread satyajit vegesna
Thank you for the info, is there a way to get all keys of JSON, so that i can create a dataframe with json keys, as below, fieldsDataframe.withColumn("data" , functions.get_json_object($"RecordString", "$.id")) this is for appending a single column in dataframe with id key. I would like to aut

Re: Json Parsing.

2017-12-06 Thread ayan guha
You can use get On Thu, 7 Dec 2017 at 10:39 am, satyajit vegesna wrote: > Does spark support automatic detection of schema from a json string in a > dataframe. > > I am trying to parse a json string and do some transofrmations on to it > (would like append new columns to the dataframe) , from th

Re: Json Parsing.

2017-12-06 Thread ayan guha
On Thu, 7 Dec 2017 at 11:37 am, ayan guha wrote: > You can use get_json function > > On Thu, 7 Dec 2017 at 10:39 am, satyajit vegesna < > satyajit.apas...@gmail.com> wrote: > >> Does spark support automatic detection of schema from a json string in a >> dataframe. >> >> I am trying to parse a jso

Json Parsing.

2017-12-06 Thread satyajit vegesna
Does spark support automatic detection of schema from a json string in a dataframe. I am trying to parse a json string and do some transofrmations on to it (would like append new columns to the dataframe) , from the data i stream from kafka. But i am not very sure, how i can parse the json in str

RE: How to increase the Json parsing speed

2015-08-28 Thread Ewan Leith
[mailto:yue.yuany...@gmail.com] Sent: 28 August 2015 08:06 To: Sabarish Sasidharan Cc: user Subject: Re: How to increase the Json parsing speed 500 each with 8GB memory. I did the test again on the cluster. I have 6000 files which generates 6000 tasks. Each task takes 1.5 min to finish based on the Stats

Re: How to increase the Json parsing speed

2015-08-28 Thread Ewan Higgs
Hi Gavin, You can increase the speed by choosing a better encoding. A little bit of ETL goes a long way. e.g. As you're working with Spark SQL you probably have a tabular format. So you could use CSV so you don't need to parse the field names on each entry (and it will also reduce the file s

Re: How to increase the Json parsing speed

2015-08-28 Thread Gavin Yue
500 each with 8GB memory. I did the test again on the cluster. I have 6000 files which generates 6000 tasks. Each task takes 1.5 min to finish based on the Stats. So theoretically it should take 15 mins roughly. WIth some additinal overhead, it totally takes 18 mins. Based on the local file pa

Re: How to increase the Json parsing speed

2015-08-27 Thread Sabarish Sasidharan
How many executors are you using when using Spark SQL? On Fri, Aug 28, 2015 at 12:12 PM, Sabarish Sasidharan < sabarish.sasidha...@manthan.com> wrote: > I see that you are not reusing the same mapper instance in the Scala > snippet. > > Regards > Sab > > On Fri, Aug 28, 2015 at 9:38 AM, Gavin Yue

Re: How to increase the Json parsing speed

2015-08-27 Thread Sabarish Sasidharan
I see that you are not reusing the same mapper instance in the Scala snippet. Regards Sab On Fri, Aug 28, 2015 at 9:38 AM, Gavin Yue wrote: > Just did some tests. > > I have 6000 files, each has 14K records with 900Mb file size. In spark > sql, it would take one task roughly 1 min to parse. >

Re: How to increase the Json parsing speed

2015-08-27 Thread Gavin Yue
Just did some tests. I have 6000 files, each has 14K records with 900Mb file size. In spark sql, it would take one task roughly 1 min to parse. On the local machine, using the same Jackson lib inside Spark lib. Just parse it. FileInputStream fstream = new FileInputStream("testfile")

Re: How to increase the Json parsing speed

2015-08-27 Thread Sabarish Sasidharan
For your jsons, can you tell us what is your benchmark when running on a single machine using just plain Java (without Spark and Spark sql)? Regards Sab On 28-Aug-2015 7:29 am, "Gavin Yue" wrote: > Hey > > I am using the Json4s-Jackson parser coming with spark and parsing roughly > 80m records w

How to increase the Json parsing speed

2015-08-27 Thread Gavin Yue
Hey I am using the Json4s-Jackson parser coming with spark and parsing roughly 80m records with totally size 900mb. But the speed is slow. It took my 50 nodes(16cores cpu,100gb mem) roughly 30mins to parse Json to use spark sql. Jackson has the benchmark saying parsing should be ms level.

Re: Json parsing library for Spark Streaming?

2015-08-10 Thread pradyumnad
I use Play json, may be its very famous. If you would like to try below is the sbt dependency "com.typesafe.play" % "play-json_2.10" % "2.2.1", -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Json-parsing-library-for-Spar

Re: Json parsing library for Spark Streaming?

2015-07-27 Thread Ted Yu
json4s is used by https://github.com/hammerlab/spark-json-relay See the other thread on 'Spree' FYI On Mon, Jul 27, 2015 at 6:07 PM, swetha wrote: > Hi, > > What is the proper Json parsing library to use in Spark Streaming? > Currently > I am trying to use Gson li

Json parsing library for Spark Streaming?

2015-07-27 Thread swetha
Hi, What is the proper Json parsing library to use in Spark Streaming? Currently I am trying to use Gson library in a Java class and calling the Java method from a Scala class as shown below: What are the advantages of using Json4S as against using Gson library in a Java class and calling it from

Re: PySpark Nested Json Parsing

2015-07-20 Thread Naveen Madhire
I had the similar issue with spark 1.3 After migrating to Spark 1.4 and using sqlcontext.read.json it worked well I think you can look at dataframe select and explode options to read the nested json elements, array etc. Thanks. On Mon, Jul 20, 2015 at 11:07 AM, Davies Liu wrote: > Could you tr

Re: PySpark Nested Json Parsing

2015-07-20 Thread Davies Liu
Could you try SQLContext.read.json()? On Mon, Jul 20, 2015 at 9:06 AM, Davies Liu wrote: > Before using the json file as text file, can you make sure that each > json string can fit in one line? Because textFile() will split the > file by '\n' > > On Mon, Jul 20, 2015 at 3:26 AM, Ajay wrote: >>

Re: PySpark Nested Json Parsing

2015-07-20 Thread Davies Liu
Before using the json file as text file, can you make sure that each json string can fit in one line? Because textFile() will split the file by '\n' On Mon, Jul 20, 2015 at 3:26 AM, Ajay wrote: > Hi, > > I am new to Apache Spark. I am trying to parse nested json using pyspark. > Here is the code

PySpark Nested Json Parsing

2015-07-20 Thread Ajay
Hi, I am new to Apache Spark. I am trying to parse nested json using pyspark. Here is the code by which I am trying to parse Json. I am using Apache Spark 1.2.0 version of cloudera CDH 5.3.2. lines = sc.textFile(inputFile) import json def func(x): json_str = json.loads(x) if json_str['label']:

Re: json parsing with json4s

2014-06-12 Thread Tobias Pfeiffer
How can I extract the values (i.e. without the JInt) ? I tried >> returning >> (v1.toInt, v2.toInt) from the map but got a compilation error stating that >> toInt is not a valid operation. >> >> 2) I would also like to know how I can filter the above tuples based on >> the >> age values. For e.g. I added the following after the second map operation: >> >> p.filter(tup => tup._1 > 20) >> >> I got a compilation errror: value > is not a member of org.json4s.JValue >> >> Thanks for your help. >> >> >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp7430.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. > >

Re: json parsing with json4s

2014-06-11 Thread Michael Cutler
er the above tuples based on > the > age values. For e.g. I added the following after the second map operation: > > p.filter(tup => tup._1 > 20) > > I got a compilation errror: value > is not a member of org.json4s.JValue > > Thanks for your help. > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp7430.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

json parsing with json4s

2014-06-11 Thread SK
g after the second map operation: p.filter(tup => tup._1 > 20) I got a compilation errror: value > is not a member of org.json4s.JValue Thanks for your help. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp74