I see that you are not reusing the same mapper instance in the Scala
snippet.

Regards
Sab

On Fri, Aug 28, 2015 at 9:38 AM, Gavin Yue <[email protected]> wrote:

> Just did some tests.
>
> I have 6000 files, each has 14K records with 900Mb file size.  In spark
> sql, it would take one task roughly 1 min to parse.
>
> On the local machine, using the same Jackson lib inside Spark lib. Just
> parse it.
>
>             FileInputStream fstream = new FileInputStream("testfile");
>             BufferedReader br = new BufferedReader(new
> InputStreamReader(fstream));
>             String strLine;
>             Long begin = System.currentTimeMillis();
>              while ((strLine = br.readLine()) != null)   {
>                 JsonNode s = mapper.readTree(strLine);
>              }
>             System.out.println(System.currentTimeMillis() - begin);
>
> In JDK8, it took *6270ms. *
>
> Same code in Scala, it would take *7486ms*
>    val begin =  java.lang.System.currentTimeMillis()
>     for(line <- Source.fromFile("testfile").getLines())
>     {
>       val mapper = new ObjectMapper()
>       mapper.registerModule(DefaultScalaModule)
>       val s = mapper.readTree(line)
>     }
>     println(java.lang.System.currentTimeMillis() - begin)
>
>
> One Json record contains two fileds :  ID and List[Event].
>
> I am guessing put all the events into List would take the left time.
>
> Any solution to speed this up?
>
> Thanks a lot!
>
>
> On Thu, Aug 27, 2015 at 7:45 PM, Sabarish Sasidharan <
> [email protected]> wrote:
>
>> For your jsons, can you tell us what is your benchmark when running on a
>> single machine using just plain Java (without Spark and Spark sql)?
>>
>> Regards
>> Sab
>> On 28-Aug-2015 7:29 am, "Gavin Yue" <[email protected]> wrote:
>>
>>> Hey
>>>
>>> I am using the Json4s-Jackson parser coming with spark and parsing
>>> roughly 80m records with totally size 900mb.
>>>
>>> But the speed is slow.  It took my 50 nodes(16cores cpu,100gb mem)
>>> roughly 30mins to parse Json to use spark sql.
>>>
>>> Jackson has the benchmark saying parsing should be ms level.
>>>
>>> Any way to increase speed?
>>>
>>> I am using spark 1.4 on Hadoop 2.7 with Java 8.
>>>
>>> Thanks a lot !
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>


-- 

Architect - Big Data
Ph: +91 99805 99458

Manthan Systems | *Company of the year - Analytics (2014 Frost and Sullivan
India ICT)*
+++

Reply via email to