[mailto:yue.yuany...@gmail.com]
Sent: 28 August 2015 08:06
To: Sabarish Sasidharan
Cc: user
Subject: Re: How to increase the Json parsing speed
500 each with 8GB memory.
I did the test again on the cluster.
I have 6000 files which generates 6000 tasks. Each task takes 1.5 min to
finish based on the Stats
Hi Gavin,
You can increase the speed by choosing a better encoding. A little bit
of ETL goes a long way.
e.g. As you're working with Spark SQL you probably have a tabular
format. So you could use CSV so you don't need to parse the field names
on each entry (and it will also reduce the file s
500 each with 8GB memory.
I did the test again on the cluster.
I have 6000 files which generates 6000 tasks. Each task takes 1.5 min to
finish based on the Stats.
So theoretically it should take 15 mins roughly. WIth some additinal
overhead, it totally takes 18 mins.
Based on the local file pa
How many executors are you using when using Spark SQL?
On Fri, Aug 28, 2015 at 12:12 PM, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:
> I see that you are not reusing the same mapper instance in the Scala
> snippet.
>
> Regards
> Sab
>
> On Fri, Aug 28, 2015 at 9:38 AM, Gavin Yue
I see that you are not reusing the same mapper instance in the Scala
snippet.
Regards
Sab
On Fri, Aug 28, 2015 at 9:38 AM, Gavin Yue wrote:
> Just did some tests.
>
> I have 6000 files, each has 14K records with 900Mb file size. In spark
> sql, it would take one task roughly 1 min to parse.
>
Just did some tests.
I have 6000 files, each has 14K records with 900Mb file size. In spark
sql, it would take one task roughly 1 min to parse.
On the local machine, using the same Jackson lib inside Spark lib. Just
parse it.
FileInputStream fstream = new FileInputStream("testfile")
For your jsons, can you tell us what is your benchmark when running on a
single machine using just plain Java (without Spark and Spark sql)?
Regards
Sab
On 28-Aug-2015 7:29 am, "Gavin Yue" wrote:
> Hey
>
> I am using the Json4s-Jackson parser coming with spark and parsing roughly
> 80m records w