Re: Spark Doubts

2022-06-25 Thread Tufan Rakshit
Please find the answers inline please . 1) Can I apply predicate pushdown filters if I have data stored in S3 or it should be used only while reading from DBs? it can be applied in s3 if you store parquet , csv, json or in avro format .It does not depend on the DB , its supported in object store li

Re: Spark Doubts

2022-06-25 Thread Sid
Hi Tufan, Thanks for the answers. However, by the second point, I mean to say where would my code reside? Will it be copied to all the executors since the code size would be small or will it be maintained on the driver's side? I know that driver converts the code to DAG and when an action is calle

Re: Spark Doubts

2022-06-25 Thread russell . spitzer
Code is always distributed for any operations on a DataFrame or RDD. The size of your code is irrelevant except to Jvm memory limits. For most jobs the entire application jar and all dependencies are put on the classpath of every executor. There are some exceptions but generally you should thi