Hello,
I am using Apache Spark 2.2.1 with Scala. I am trying to load below JSON from
Kafka and trying to extract "JOBTYPE" and "LOADID" from the nested JSON object.
Need help with extraction logic.
Code
val workRequests = new StructType().add("after", new StructType()
Hi,
I'm new to Spark and Scala, need help on transforming Nested JSON using Scala.
We have upstream returning JSON like
{
"id": 100,
"text": "Hello, world."
Users : [ "User1": {
"name": "Brett",
"id": 200,
"Type" : "Employee"
"empid":"2"
},
"Use
Thank you for the info, is there a way to get all keys of JSON, so that i
can create a dataframe with json keys, as below,
fieldsDataframe.withColumn("data" ,
functions.get_json_object($"RecordString", "$.id")) this is for appending
a single column in dataframe with id key.
I would like to aut
You can use get
On Thu, 7 Dec 2017 at 10:39 am, satyajit vegesna
wrote:
> Does spark support automatic detection of schema from a json string in a
> dataframe.
>
> I am trying to parse a json string and do some transofrmations on to it
> (would like append new columns to the dataframe) , from th
On Thu, 7 Dec 2017 at 11:37 am, ayan guha wrote:
> You can use get_json function
>
> On Thu, 7 Dec 2017 at 10:39 am, satyajit vegesna <
> satyajit.apas...@gmail.com> wrote:
>
>> Does spark support automatic detection of schema from a json string in a
>> dataframe.
>>
>> I am trying to parse a jso
Does spark support automatic detection of schema from a json string in a
dataframe.
I am trying to parse a json string and do some transofrmations on to it
(would like append new columns to the dataframe) , from the data i stream
from kafka.
But i am not very sure, how i can parse the json in str
[mailto:yue.yuany...@gmail.com]
Sent: 28 August 2015 08:06
To: Sabarish Sasidharan
Cc: user
Subject: Re: How to increase the Json parsing speed
500 each with 8GB memory.
I did the test again on the cluster.
I have 6000 files which generates 6000 tasks. Each task takes 1.5 min to
finish based on the Stats
Hi Gavin,
You can increase the speed by choosing a better encoding. A little bit
of ETL goes a long way.
e.g. As you're working with Spark SQL you probably have a tabular
format. So you could use CSV so you don't need to parse the field names
on each entry (and it will also reduce the file s
500 each with 8GB memory.
I did the test again on the cluster.
I have 6000 files which generates 6000 tasks. Each task takes 1.5 min to
finish based on the Stats.
So theoretically it should take 15 mins roughly. WIth some additinal
overhead, it totally takes 18 mins.
Based on the local file pa
How many executors are you using when using Spark SQL?
On Fri, Aug 28, 2015 at 12:12 PM, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:
> I see that you are not reusing the same mapper instance in the Scala
> snippet.
>
> Regards
> Sab
>
> On Fri, Aug 28, 2015 at 9:38 AM, Gavin Yue
I see that you are not reusing the same mapper instance in the Scala
snippet.
Regards
Sab
On Fri, Aug 28, 2015 at 9:38 AM, Gavin Yue wrote:
> Just did some tests.
>
> I have 6000 files, each has 14K records with 900Mb file size. In spark
> sql, it would take one task roughly 1 min to parse.
>
Just did some tests.
I have 6000 files, each has 14K records with 900Mb file size. In spark
sql, it would take one task roughly 1 min to parse.
On the local machine, using the same Jackson lib inside Spark lib. Just
parse it.
FileInputStream fstream = new FileInputStream("testfile")
For your jsons, can you tell us what is your benchmark when running on a
single machine using just plain Java (without Spark and Spark sql)?
Regards
Sab
On 28-Aug-2015 7:29 am, "Gavin Yue" wrote:
> Hey
>
> I am using the Json4s-Jackson parser coming with spark and parsing roughly
> 80m records w
Hey
I am using the Json4s-Jackson parser coming with spark and parsing roughly 80m
records with totally size 900mb.
But the speed is slow. It took my 50 nodes(16cores cpu,100gb mem) roughly
30mins to parse Json to use spark sql.
Jackson has the benchmark saying parsing should be ms level.
I use Play json, may be its very famous.
If you would like to try below is the sbt dependency
"com.typesafe.play" % "play-json_2.10" % "2.2.1",
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Json-parsing-library-for-Spar
json4s is used by https://github.com/hammerlab/spark-json-relay
See the other thread on 'Spree'
FYI
On Mon, Jul 27, 2015 at 6:07 PM, swetha wrote:
> Hi,
>
> What is the proper Json parsing library to use in Spark Streaming?
> Currently
> I am trying to use Gson li
Hi,
What is the proper Json parsing library to use in Spark Streaming? Currently
I am trying to use Gson library in a Java class and calling the Java method
from a Scala class as shown below: What are the advantages of using Json4S
as against using Gson library in a Java class and calling it from
I had the similar issue with spark 1.3
After migrating to Spark 1.4 and using sqlcontext.read.json it worked well
I think you can look at dataframe select and explode options to read the
nested json elements, array etc.
Thanks.
On Mon, Jul 20, 2015 at 11:07 AM, Davies Liu wrote:
> Could you tr
Could you try SQLContext.read.json()?
On Mon, Jul 20, 2015 at 9:06 AM, Davies Liu wrote:
> Before using the json file as text file, can you make sure that each
> json string can fit in one line? Because textFile() will split the
> file by '\n'
>
> On Mon, Jul 20, 2015 at 3:26 AM, Ajay wrote:
>>
Before using the json file as text file, can you make sure that each
json string can fit in one line? Because textFile() will split the
file by '\n'
On Mon, Jul 20, 2015 at 3:26 AM, Ajay wrote:
> Hi,
>
> I am new to Apache Spark. I am trying to parse nested json using pyspark.
> Here is the code
Hi,
I am new to Apache Spark. I am trying to parse nested json using pyspark.
Here is the code by which I am trying to parse Json.
I am using Apache Spark 1.2.0 version of cloudera CDH 5.3.2.
lines = sc.textFile(inputFile)
import json
def func(x):
json_str = json.loads(x)
if json_str['label']:
How can I extract the values (i.e. without the JInt) ? I tried
>> returning
>> (v1.toInt, v2.toInt) from the map but got a compilation error stating that
>> toInt is not a valid operation.
>>
>> 2) I would also like to know how I can filter the above tuples based on
>> the
>> age values. For e.g. I added the following after the second map operation:
>>
>> p.filter(tup => tup._1 > 20)
>>
>> I got a compilation errror: value > is not a member of org.json4s.JValue
>>
>> Thanks for your help.
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp7430.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>
er the above tuples based on
> the
> age values. For e.g. I added the following after the second map operation:
>
> p.filter(tup => tup._1 > 20)
>
> I got a compilation errror: value > is not a member of org.json4s.JValue
>
> Thanks for your help.
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp7430.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
g after the second map operation:
p.filter(tup => tup._1 > 20)
I got a compilation errror: value > is not a member of org.json4s.JValue
Thanks for your help.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp74
24 matches
Mail list logo