I see that you are not reusing the same mapper instance in the Scala snippet.
Regards Sab On Fri, Aug 28, 2015 at 9:38 AM, Gavin Yue <[email protected]> wrote: > Just did some tests. > > I have 6000 files, each has 14K records with 900Mb file size. In spark > sql, it would take one task roughly 1 min to parse. > > On the local machine, using the same Jackson lib inside Spark lib. Just > parse it. > > FileInputStream fstream = new FileInputStream("testfile"); > BufferedReader br = new BufferedReader(new > InputStreamReader(fstream)); > String strLine; > Long begin = System.currentTimeMillis(); > while ((strLine = br.readLine()) != null) { > JsonNode s = mapper.readTree(strLine); > } > System.out.println(System.currentTimeMillis() - begin); > > In JDK8, it took *6270ms. * > > Same code in Scala, it would take *7486ms* > val begin = java.lang.System.currentTimeMillis() > for(line <- Source.fromFile("testfile").getLines()) > { > val mapper = new ObjectMapper() > mapper.registerModule(DefaultScalaModule) > val s = mapper.readTree(line) > } > println(java.lang.System.currentTimeMillis() - begin) > > > One Json record contains two fileds : ID and List[Event]. > > I am guessing put all the events into List would take the left time. > > Any solution to speed this up? > > Thanks a lot! > > > On Thu, Aug 27, 2015 at 7:45 PM, Sabarish Sasidharan < > [email protected]> wrote: > >> For your jsons, can you tell us what is your benchmark when running on a >> single machine using just plain Java (without Spark and Spark sql)? >> >> Regards >> Sab >> On 28-Aug-2015 7:29 am, "Gavin Yue" <[email protected]> wrote: >> >>> Hey >>> >>> I am using the Json4s-Jackson parser coming with spark and parsing >>> roughly 80m records with totally size 900mb. >>> >>> But the speed is slow. It took my 50 nodes(16cores cpu,100gb mem) >>> roughly 30mins to parse Json to use spark sql. >>> >>> Jackson has the benchmark saying parsing should be ms level. >>> >>> Any way to increase speed? >>> >>> I am using spark 1.4 on Hadoop 2.7 with Java 8. >>> >>> Thanks a lot ! >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> > -- Architect - Big Data Ph: +91 99805 99458 Manthan Systems | *Company of the year - Analytics (2014 Frost and Sullivan India ICT)* +++
