Re: How can I config hive.metastore.warehouse.dir

2021-08-11 Thread eab...@163.com
Hi, I think you should set hive-site.xml before init SparkSession, spark will connect to metostore,and logged like that: == 2021-08-12 09:21:21 INFO HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes. 2021-08-12 09:21:22

Register an Aggregator as an UDAF for Spark SQL 3

2021-08-11 Thread AlstonWilliams
HI all, I use Spark 3.0.2, I have written an Aggregator function, and I wanna register it to Spark SQL, so I can call it by ThriftServer. In Spark 2.4, I can extends `UserDefinedAggregationFunction`, and use the following statement to register it in Spark SQL shell: ``` CREATE F

How can I config hive.metastore.warehouse.dir

2021-08-11 Thread igyu
I need write data to hive with spark val proper = new Properties proper.setProperty("fs.defaultFS", "hdfs://nameservice1") proper.setProperty("dfs.nameservices", "nameservice1") proper.setProperty("dfs.ha.namenodes.nameservice1", "namenode337,namenode369") proper.setProperty("dfs.namenode.rpc-add

How can I config hive.metastore.warehouse.dir

2021-08-11 Thread igyu
I need write data to hive with spark val proper = new Properties proper.setProperty("fs.defaultFS", "hdfs://nameservice1") proper.setProperty("dfs.nameservices", "nameservice1") proper.setProperty("dfs.ha.namenodes.nameservice1", "namenode337,namenode369") proper.setProperty("dfs.namenode.rpc-add

Spark Structured Streaming Dyanamic Allocation

2021-08-11 Thread Zhenyu Hu
Hey folks: does Spark Structured Streaming have any plans for dynamic scaling? Currently Spark only has a dynamic scaling mechanism for batch jobs

Spark DStream Dynamic Allocation

2021-08-11 Thread Zhenyu Hu
1. First of all, I would like to ask whether the dynamic scaling of Spark DStream is available now? It is not mentioned in the Spark documentation 2. Spark DStream dynamic scaling will randomly kill a non-receiver executor when the average processing delay divided by the batch processing interval i

about ShellBasedUnixGroupsMapping question

2021-08-11 Thread igyu
when I read hive I get WARN 21/08/12 10:01:29 WARN ShellBasedUnixGroupsMapping: unable to return groups for user jztwk PartialGroupNameException Does not support partial group name resolution on Windows. GetLocalGroupsForUser error (1332): ? at org.apache.hadoop.security.Shel

Dynamic Allocation& ExecutorMonitor Shuffle Timeout & CacheTimeout

2021-08-11 Thread Zhenyu Hu
In private class Tracker of org.apache.spark.scheduler.dynalloc.ExecutorMonitor, the method ` updateTimeout ` will take the min of `_cach

Re: Performance of PySpark jobs on the Kubernetes cluster

2021-08-11 Thread David Diebold
Hi Mich, I don't quite understand why the driver node is using so much CPU, but it may be unrelated to your executors being underused. About your executors being underused, I would check that your job generated enough tasks. Then I would check spark.executor.cores and spark.tasks.cpus parameters t

Spark Issues while upgrade to 2.4 from 1.6 in Parcels

2021-08-11 Thread Harsh Sharma
hi Team , we are upgrading our cloudera parcels to 6.X from 5.x , hence e have upgraded version of park from 1.6 to 2.4 . While executing a spark program we are getting the below error : Please help us how to resolve in cloudera parcels. There are suggestion to install spark gateway roles