Re: Questions regarding Jobs, Stages and Caching

2017-05-25 Thread Ram Navan
Thank You Stephen and Nicholas. I specified the schema to spark.read.json() and the time to execute this instruction got reduced to 4 minutes from original 8 minutes! I also see only two jobs (instead of three when calling with no schema) created. Please refer to attachment job0 and job2 from the

Re: Questions regarding Jobs, Stages and Caching

2017-05-25 Thread Nicholas Hakobian
tion will be faster. My next >> > statement is files_df.count(). This operation took an entire 8.8 >> minutes and >> > it looks like it read the files again from s3 and calculated the count. >> > Please refer to attached count.jpg file for reference. count.jpg >

Re: Questions regarding Jobs, Stages and Caching

2017-05-25 Thread Steffen Schmitz
d what’s happening beneath the hood. > > Thanks in advance! > > Ram > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Questions-regarding-Jobs-Stages-and-Caching-tp28708.html > Sent from the Apache Spark User List mailin

Re: Questions regarding Jobs, Stages and Caching

2017-05-25 Thread Ram Navan
ference. count.jpg > > <http://apache-spark-user-list.1001560.n3.nabble.com/ > file/n28708/count.jpg> > > Why is this happening? If I call files_df.count() for the second time, it > > comes back fast within few seconds. Can someone explain this? > > > > In gener

Re: Questions regarding Jobs, Stages and Caching

2017-05-25 Thread Steffen Schmitz
for a good source to learn about Spark Internals > and try to understand what’s happening beneath the hood. > > Thanks in advance! > > Ram > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Questions-regarding-Jobs-Stages-and-C

Questions regarding Jobs, Stages and Caching

2017-05-24 Thread ramnavan
meone explain this? In general, I am looking for a good source to learn about Spark Internals and try to understand what’s happening beneath the hood. Thanks in advance! Ram -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Questions-regarding-Jobs-Stages-