Another possible option would be creating partitioned table in hive and use
dynamic partitioning while inserting. This will not require spark to do
explocit partition by
On Tue, 26 Sep 2017 at 12:39 pm, Ankur Srivastava <
ankur.srivast...@gmail.com> wrote:
> Hi Amit,
>
> Spark keeps the partition
Hi Amit,
Spark keeps the partition that it is working on in memory (and does not
spill to disk even if it is running OOM). Also since you are getting OOM
when using partitionBy (and not when you just use flatMap), there should be
one (or few) dates on which your partition size is bigger than the h
Hi, Amit,
Maybe you can change this configuration spark.sql.shuffle.partitions.
The default is 200 change this property could change the task number when you
are using DataFrame API.
> On 26 Sep 2017, at 1:25 AM, Amit Sela wrote:
>
> I'm trying to run a simple pyspark application that reads fr
Is there a way to unpersist all data frames, data sets, and/or RDD in Spark
2.2 in a single call?
Thanks
--
Cesar Flores
The Spark on Kubernetes development community is pleased to announce
release 0.4.0 of Apache Spark with native Kubernetes scheduler back-end!
The dev community is planning to use this release as the reference for
upstreaming native kubernetes capability over the Spark 2.3 release cycle.
This rele
Thanks for the reply. Forgot to mention that, our Batch ETL Jobs are in
Core-Spark.
On Sep 22, 2017, at 3:13 PM, Vadim Semenov
wrote:
1. 40s is pretty negligible unless you run your job very frequently, there
can be many factors that influence that.
2. Try to compare the CPU time instead of th
I'm trying to run a simple pyspark application that reads from file (json),
flattens it (explode) and writes back to file (json) partitioned by date
using DataFrameWriter.partitionBy(*cols).
I keep getting OOMEs like:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.spark.util.collection.
Can anyone provide me code snippet/ steps to write a data frame to Kafka
topic in a spark streaming application using pyspark with spark 2.1.1 and
Kafka 0.8 (Direct Stream Approach)?
Thanks,
Umar
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
---
Hi,
I'm using hive 2.3.0, spark 2.1.1, and zeppelin 0.7.2.
When I submit query in hive interpreter, it works fine.
I could see exactly same query in zeppelin notebook and hiveserver2 web UI.
However, when I submitted query using sparksql, query seemed wrong.
For example, every columns are with do
Just build a fat jar and do not apply --packages
serkan ta? schrieb am Mo. 25. Sep. 2017 um 09:24:
> Hi,
>
> Everytime i submit spark job, checks the dependent jars from remote maven
> repo.
>
> Is it possible to set spark first load the cached jars rather than
> looking for internet connection?
Hi,
Everytime i submit spark job, checks the dependent jars from remote maven repo.
Is it possible to set spark first load the cached jars rather than looking for
internet connection?
11 matches
Mail list logo