You probably need to take a look at your hive-site.xml and see what the
location is for the Hive Metastore. As for beeline, you can explicitly use an
instance of Hive server by passing in the JDBC url to the hiveServer when you
launch the client; e.g. beeline –u “jdbc://example.com:5432”
Try ta
I don’t think sql context is “deprecated” in this sense. It’s still accessible
by earlier versions of Spark.
But yes, at first glance it looks like you are correct. I don’t see a
recordWriter method for parquet outside of the SQL package.
https://spark.apache.org/docs/latest/api/scala/index.html
+1
AFAIK,
vCores are not the same as Cores in AWS.
https://samrueby.com/2015/01/12/what-are-amazon-aws-vcpus/
I’ve always understood it as cores = num concurrent threads
These posts might help you with your research and why exceeding 5 cores per
executor doesn’t make sense.
https://stackover
Might sound silly, but are you using a Hive context?
What errors do the Hive query results return?
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
The second part of your questions, you are creating a temp table and then
subsequently creating another table from that temp view. Doe
Spark cannot read locally from S3 without an S3a protocol; you’ll more than
likely need a local copy of the data or you’ll need to utilize the proper jars
to enable S3 communication from the edge to the datacenter.
https://stackoverflow.com/questions/30385981/how-to-access-s3a-files-from-apache-
Joren,
Anytime there is a shuffle in the network, Spark moves to a new stage. It seems
like you are having issues either pre or post shuffle. Have you looked at a
resource management tool like ganglia to determine if this is a memory or
thread related issue? The spark UI?
You are using groupBy
Alcon,
You can most certainly do this. I’ve done benchmarking with Spark SQL and the
TPCDS queries using S3 as the filesystem.
Zeppelin and Livy server work well for the dash boarding and concurrent query
issues: https://hortonworks.com/blog/livy-a-rest-interface-for-apache-spark/
Livy Server
Sounds like an S3 bug. Can you replicate locally with HDFS?
Try using S3a protocol too; there is a jar you can leverage like so:
spark-submit --packages
com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.3
my_spark_program.py
EMR can sometimes be buggy. :/
You could also try le
This might help; I’ve built a REST API with livyServer:
https://livy.incubator.apache.org/
From: Steve Loughran
Date: Saturday, August 19, 2017 at 7:05 AM
To: Imtiaz Ahmed
Cc: "user@spark.apache.org"
Subject: Re: How to authenticate to ADLS from within spark job on the fly
On 19 Aug 2017,
+1 what is the executor memory? You may need to adjust executor memory and
cores. For the sake of simplicity; each executor can handle 5 concurrent tasks
and should have 5 cores. So if your cluster has 100 cores, you’d have 20
executors. And if your cluster memory is 500gb, each executor would h
10 matches
Mail list logo