date:20170925

Re: partitionBy causing OOM

2017-09-25 Thread ayan guha

Another possible option would be creating partitioned table in hive and use dynamic partitioning while inserting. This will not require spark to do explocit partition by On Tue, 26 Sep 2017 at 12:39 pm, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Hi Amit, > > Spark keeps the partition

Re: partitionBy causing OOM

2017-09-25 Thread Ankur Srivastava

Hi Amit, Spark keeps the partition that it is working on in memory (and does not spill to disk even if it is running OOM). Also since you are getting OOM when using partitionBy (and not when you just use flatMap), there should be one (or few) dates on which your partition size is bigger than the h

Re: partitionBy causing OOM

2017-09-25 Thread 孫澤恩

Hi, Amit, Maybe you can change this configuration spark.sql.shuffle.partitions. The default is 200 change this property could change the task number when you are using DataFrame API. > On 26 Sep 2017, at 1:25 AM, Amit Sela wrote: > > I'm trying to run a simple pyspark application that reads fr

Unpersist all from memory in spark 2.2

2017-09-25 Thread Cesar

Is there a way to unpersist all data frames, data sets, and/or RDD in Spark 2.2 in a single call? Thanks -- Cesar Flores

Announcing Spark on Kubernetes release 0.4.0

2017-09-25 Thread Erik Erlandson

The Spark on Kubernetes development community is pleased to announce release 0.4.0 of Apache Spark with native Kubernetes scheduler back-end! The dev community is planning to use this release as the reference for upstreaming native kubernetes capability over the Spark 2.3 release cycle. This rele

Re: What are factors need to Be considered when upgrading to Spark 2.1.0 from Spark 1.6.0

2017-09-25 Thread Gokula Krishnan D

Thanks for the reply. Forgot to mention that, our Batch ETL Jobs are in Core-Spark. On Sep 22, 2017, at 3:13 PM, Vadim Semenov wrote: 1. 40s is pretty negligible unless you run your job very frequently, there can be many factors that influence that. 2. Try to compare the CPU time instead of th

partitionBy causing OOM

2017-09-25 Thread Amit Sela

I'm trying to run a simple pyspark application that reads from file (json), flattens it (explode) and writes back to file (json) partitioned by date using DataFrameWriter.partitionBy(*cols). I keep getting OOMEs like: java.lang.OutOfMemoryError: Java heap space at org.apache.spark.util.collection.

How to write dataframe to kafka topic in spark streaming application using pyspark?

2017-09-25 Thread umargeek

Can anyone provide me code snippet/ steps to write a data frame to Kafka topic in a spark streaming application using pyspark with spark 2.1.1 and Kafka 0.8 (Direct Stream Approach)? Thanks, Umar -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

hive2 query using SparkSQL seems wrong

2017-09-25 Thread Cinyoung Hur

Hi, I'm using hive 2.3.0, spark 2.1.1, and zeppelin 0.7.2. When I submit query in hive interpreter, it works fine. I could see exactly same query in zeppelin notebook and hiveserver2 web UI. However, when I submitted query using sparksql, query seemed wrong. For example, every columns are with do

Re: Offline environment

2017-09-25 Thread Georg Heiler

Just build a fat jar and do not apply --packages serkan ta? schrieb am Mo. 25. Sep. 2017 um 09:24: > Hi, > > Everytime i submit spark job, checks the dependent jars from remote maven > repo. > > Is it possible to set spark first load the cached jars rather than > looking for internet connection?

Offline environment

2017-09-25 Thread serkan ta?

Hi, Everytime i submit spark job, checks the dependent jars from remote maven repo. Is it possible to set spark first load the cached jars rather than looking for internet connection?

Re: partitionBy causing OOM

Re: partitionBy causing OOM

Re: partitionBy causing OOM

Unpersist all from memory in spark 2.2

Announcing Spark on Kubernetes release 0.4.0

Re: What are factors need to Be considered when upgrading to Spark 2.1.0 from Spark 1.6.0

partitionBy causing OOM

How to write dataframe to kafka topic in spark streaming application using pyspark?

hive2 query using SparkSQL seems wrong

Re: Offline environment

Offline environment

11 matches

Site Navigation

Mail list logo

Footer information