Looks like your problem is related to not setting up a hive.xml file
properly. The standard Spark distribution doesn't include a hive.xml
template file in the conf directory. You will have to create one by
yourself. Please refer to the Spark user doc and Hive metastore config
guide for detai
Thanks
Sounds like you experimented on prem with HDFS and Spark using the same
host nodes with data affinity. I am not sure it is something I can sell in
a banking environment so to speak. Bottom line it will boil down to
procuring more tin boxes on -prem to give spark more memory, assuming that
i
in spark-shell
I can run
val url = "hdfs://nameservice1/user/jztwk/config.json"
Spark.sparkContext.addFile(url)
val json_str = readLocalFile(SparkFiles.get(url.split("/").last))
but when I make jar package
spark-submit --master yarn --deploy-mode cluster --principal
jztwk/had...@join.com --key
Yes, this is very much "use at your own risk". That said at Yahoo we did
something very similar to this on all of the YARN nodes and saw a decent
performance uplift. This was even with HDFS running on the same nodes. I
think we just changed the time to flush to 30 mins, but it was a long time
ago
Hi Bobby,
On this statement of yours if I may:
... If you really want to you can configure the pagecache to not spill to
disk until absolutely necessary. That should get you really close to pure
in-memory processing, so long as you have enough free memory on the host to
support it.
I would not p
On the data path, Spark will write to a local disk when it runs out of
memory and needs to spill or when doing a shuffle with the default shuffle
implementation. The spilling is a good thing because it lets you process
data that is too large to fit in memory. It is not great because the
processin
Hi Dev
Environment details
Hadoop 3.2
Hive 3.1
Spark 3.0.3
Cluster : Kerborized .
1) Hive server is running fine
2) Spark sql , sparkshell, spark submit everything is working as expected.
3) Connecting Hive through beeline is working fine (after kinit)
beeline -u "jdbc:hive2://:/default;princip
Well I don't know what having an "in-memory Spark only" is going to
achieve. Spark GUI shows the amount of disk usage pretty well. The memory
is used exclusively by default first.
Spark is no different from a predominantly in-memory application.
Effectively it is doing the classical disk based had
Hello Jacek,
On 20/8/21 2:49 μ.μ., Jacek Laskowski wrote:
Hi,
I've been exploring BlockManager and the stores for a while now and am
tempted to say that a memory-only Spark setup would be possible
(except shuffle blocks). Is this correct?
Correct.
What about shuffle blocks? Do they have to
Hi,
I've been exploring BlockManager and the stores for a while now and am
tempted to say that a memory-only Spark setup would be possible (except
shuffle blocks). Is this correct?
What about shuffle blocks? Do they have to be stored on disk (in DiskStore)?
I think broadcast variables are in-mem
10 matches
Mail list logo