Re: No matter how many instances and cores configured for spark on k8s, only one executor is reading file

2020-12-21 Thread ????
Hi Finally, i found a configuraiton parameter:  spark.default.parallelism Change this parmater will finally change the parallel running exeutor amount,  although log file still says first 15 tasks ... blabla.  Any way, my problem is solved. -- Original 

Re: providing a list parameter for sum function

2020-12-21 Thread nick.gustafson
You’ll need to “unpack” the array using an asterisk in python like so: df = df.groupBy(groupBy_cols).sum(*agged_cols_list) -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubs

Kubernetes spark insufficient cpu error

2020-12-21 Thread rajat kumar
Hey All I am facing this error while running spark on kubernetes, can anyone suggest what can be corrected here? I am using minikube and spark 2.4 to run a spark submit with cluster mode. default-scheduler 0/1 nodes are available: 1 Insufficient cpu. Regards Rajat

Re: No matter how many instances and cores configured for spark on k8s, only one executor is reading file

2020-12-21 Thread Sean Owen
Pass more partitions to the second argument of parallelize()? On Mon, Dec 21, 2020 at 7:39 AM 沈俊 wrote: > Hi > > I am now trying to use spark to do tcpdump pcap file analysis. The first > step is to read the file and parse the content to dataframe according to > analysis requirements. > > I've

No matter how many instances and cores configured for spark on k8s, only one executor is reading file

2020-12-21 Thread ????
Hi I am now trying to use spark to do tcpdump pcap file analysis.  The first step is to read the file and parse the content to dataframe according to analysis requirements.  I've made a public folder for all executors so that they can access it directly like a local file system.  Here is the

Re: Spark 3.0.1 fails to insert into Hive Parquet table but Spark 2.11.12 used to work

2020-12-21 Thread Mich Talebzadeh
Hi, This sounds like a bug. It works if I put an *arbitrary limit on insert* INSERT INTO TABLE test.randomData SELECT ID , CLUSTERED , SCATTERED , RANDOMISED , RANDOM_STRING , SMALL_VC , PADDING FROM tmp LIMIT 1000 This works fi