I want read data from hive cluster1
and write data to hive cluster2
How can I do it?
notice: cluster1,cluster2 are enable kerberos
igyu
Has anyone had any experience with running Spark-Rapids on a GPU-powered
cluster (https://github.com/NVIDIA/spark-rapids)? I am very interested
in knowing:
1. What is the hardware/software platform and the type of Spark cluster
you are using to run Spark-Rapids?
2. How easy was the installa
You may recall that I raised a few questions here and in Stacktrace
regarding two items both related to running Pyspark inside kubernetes.
The challenge was
1. Load third party packages like tensorflow, numpy, pyyaml in running
job in k8s
2. How to read from a yaml file to load initiali
Yes indeed very good points by the Artemis User.
Just to add if I may, why choose Spark? Generally, parallel architecture
comes into play when the data size is significantly large which cannot be
handled on a single machine, hence, the use of Spark becomes meaningful. In
cases where (the generate
PySpark still uses Spark dataframe underneath (it wraps java code). Use
PySpark when you have to deal with big data ETL and analytics so you can
leverage the distributed architecture in Spark. If you job is simple,
dataset is relatively small, and doesn't require distributed processing,
use Pa
Can you please post the error log/exception messages? There is not
enough info to help diagnose what the real problem is
On 7/29/21 8:55 AM, Big data developer need help relat to spark gateway
roles in 2.0 wrote:
Hi Team ,
We are facing issue in production where we are getting frequent
Hello team
Someone asked me regarding well developed Python code with Panda dataframe and
comparing that to PySpark.
Under what situations one choose PySpark instead of Python and Pandas.
Appreciate
AK
Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to
Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings : failed to remove cache rdd or failed to remove broadcast variable. Please help us how to
Hi Renganathan,
Not quite. It strongly depends on your usage of UDFs defined in any
manner — as UDF object or just lambdas. If you have ones — they may and
will be called on executors too.
On 21/07/29 05:17, Renganathan Mutthiah wrote:
> Hi,
>
> I have read in many materials (including from the
Hi,
I have read in many materials (including from the book: Spark - The
Definitive Guide) that Spark is a compiler.
In my understanding, our program is used until the point of DAG generation.
This portion can be written in any language - Java,Scala,R,Python.
Post that (executing the DAG), the eng
11 matches
Mail list logo