I was able to query data from Impala table. Here is my git repo for anyone
who would like to check it :-
https://github.com/morfious902002/impala-spark-jdbc-kerberos
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com
Did you ever find a solution to this? If so, can you share your solution? I
am running into similar issue in YARN cluster mode connecting to impala
table.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-jdbc-impala-with-kerberos-using-yarn-client-tp275
The issue seems to be with primordial class loader. I cannot load the drivers
to all the nodes at the same location but have loaded the jars to HDFS. I
have tried SPARK_YARN_DIST_FILES as well as SPARK_CLASSPATH on the edge node
with no luck. Is there another way to load these jars through primord
Hi,
I am trying to create a Dataframe by querying Impala Table. It works fine in
my local environment but when I try to run it in cluster I either get
Error:java.lang.ClassNotFoundException: com.cloudera.impala.jdbc41.Driver
or
No Suitable Driver found.
Can someone help me or direct me to how
We are using spark 1.6.1 on a CDH 5.5 cluster. The job worked fine with
Kerberos but when we implemented Encryption at Rest we ran into the
following issue:-
Df.write().mode(SaveMode.Append).partitionBy("Partition").parquet(path);
I have already tried setting these values with no success :-
I am using Spark 1.6.1 and writing to HDFS. In some cases it seems like all
the work is being done by one thread. Why is that?
Also, I need parquet.enable.summary-metadata to register the parquet files
to Impala.
Df.write().partitionBy("COLUMN").parquet(outputFileLocation);
It also, seems li
Hi all,
I have searched a bit before posting this query.
Using Spark 1.6.1
Dataframe.write().format("parquet").mode(SaveMode.Append).save("location)
Note:- The data in that folder can be deleted and most of the times that
folder doesn't even exist.
Which Savemode is the best, if necessary at all
I have a spark job that creates 6 million rows in RDDs. I convert the RDD
into Data-frame and write it to HDFS. Currently it takes 3 minutes to write
it to HDFS.
I am using spark 1.5.1 with YARN.
Here is the snippet:-
RDDList.parallelStream().forEach(mapJavaRDD -> {
if (mapJava
Hi,
I am trying to create a Spark cluster using the spark-ec2 script which will
support 2.5.0-cdh5.3.2 for HDFS as well as Hive. I created a cluster by
adding --hadoop-major-version=2.5.0 which solved some of the errors I was
getting. But now when I run select query on hive I get the following erro
Hi,
I created a cluster using spark-ec2 script. But it installs HDFS version
1.0. I would like to use this cluster to connect to HIVE installed on a
cloudera CDH 5.3 cluster. But I am getting the following error:-
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot
communicate with
10 matches
Mail list logo