date:20230509

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-09 Thread Mich Talebzadeh

When I run this job in local mode spark-submit --master local[4] with spark = SparkSession.builder \ .appName("tests") \ .enableHiveSupport() \ .getOrCreate() spark.conf.set("spark.sql.adaptive.enabled", "true") df3.explain(extended=True) and no caching I see this p

Re: Can Spark SQL (not DataFrame or Dataset) aggregate array into map of element of count?

2023-05-09 Thread Yong Zhang

Hi, Mich: Thanks for your reply, but maybe I didn't make my question clear. I am looking for a solution to compute the count of each element in an array, without "exploding" the array, and output a Map structure as a column. For example, for an array as ('a', 'b', 'a'), I want to output a column

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-09 Thread Nitin Siwach

I do not think InMemoryFileIndex means it is caching the data. The caches get shown as InMemoryTableScan. InMemoryFileIndex is just for partition discovery and partition pruning. Any read will always show up as a scan from InMemoryFileIndex. It is not cached data. It is a cached file index. Please

Re: Can Spark SQL (not DataFrame or Dataset) aggregate array into map of element of count?

2023-05-09 Thread Yong Zhang

Hi, Mich: Thanks for your reply, but maybe I didn't make my question clear. I am looking for a solution to compute the count of each element in an array, without "exploding" the array, and output a Map structure as a column. For example, for an array as ('a', 'b', 'a'), I want to output a column

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

2023-05-09 Thread Mich Talebzadeh

When you run this in yarn mode, it uses Broadcast Hash Join for join operation as shown in the following output. The datasets here are the same size, so it broadcasts one dataset to all of the executors and then reads the same dataset and does a hash join. It is typical of joins . No surprises h

unsubscribe

2023-05-09 Thread Balakumar iyer S

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

Re: Can Spark SQL (not DataFrame or Dataset) aggregate array into map of element of count?

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

Re: Can Spark SQL (not DataFrame or Dataset) aggregate array into map of element of count?

Re: Does spark read the same file twice, if two stages are using the same DataFrame?

unsubscribe

6 matches

Site Navigation

Mail list logo

Footer information