date:20180802

Re: Clearing usercache on EMR [pyspark]

2018-08-02 Thread Shuporno Choudhury

Can anyone please help me with this issue? On Fri, 3 Aug 2018 at 11:27, Shuporno Choudhury < shuporno.choudh...@gmail.com> wrote: > Can anyone please help me with this issue? > > On Wed, 1 Aug 2018 at 12:50, Shuporno Choudhury [via Apache Spark User > List] wrote: > >> Hi everyone, >> I am runn

Re: Spark on Kubernetes: Kubernetes killing executors because of overallocation of memory

2018-08-02 Thread Matt Cheah

Hi there, You may want to look at setting the memory overhead settings higher. Spark will then start containers with a higher memory limit (spark.executor.memory + spark.executor.memoryOverhead, to be exact) while the heap is still locked to spark.executor.memory. There’s some memory used by

Insert into dynamic partitioned hive/parquet table throws error - Partition spec contains non-partition columns

2018-08-02 Thread Nirav Patel

I am trying to insert overwrite multiple partitions into existing partitioned hive/parquet table. Table was created using sparkSession. I have a table 'mytable' with partitions P1 and P2. I have following set on sparkSession object: .config("hive.exec.dynamic.partition", true) .config("

Re: [External Sender] re: streaming, batch / spark 2.2.1

2018-08-02 Thread Peter Liu

thanks for the clarification. the processing time on both systems seems to be fine :(a) based on the pattern of batch processing time chart, i.e the batch processing time is not becoming longer and longer (see charts attached below); (b) the input data on each spark stage of every batch remains t

Re: re: streaming, batch / spark 2.2.1

2018-08-02 Thread zakhavan

Yes, I am loading a text file from my local machine into a kafka topic using the script below and I'd like to calculate the number of samples per second which is used by kafka consumer. if __name__ == "__main__": print("hello spark") sc = SparkContext(appName="STALTA") ssc = Streaming

Re: [External Sender] re: streaming, batch / spark 2.2.1

2018-08-02 Thread Jayesh Lalwani

What is differrent between the 2 systems? If one system processes records faster than the other, simply because it does less processing, then you can expect the first system to have a higher throughput than the second. It's hard to say why one system has double the throughput of another without kno

Re: re: streaming, batch / spark 2.2.1

2018-08-02 Thread zakhavan

Hello, I just had a question. Could you refer me to a link or tell me how you calculated these logs such as: *300K msg/sec to a kafka broker, 220bytes per message * I'm load a text file with 36000 records into a kafka topic and I'd like to calculate the data rate (#samples per sec) in kafka. Tha

Spark on Kubernetes: Kubernetes killing executors because of overallocation of memory

2018-08-02 Thread Jayesh Lalwani

We are running Spark 2.3 on a Kubernetes cluster. We have set the following spark configuration options "spark.executor.memory": "7g", "spark.driver.memory": "2g", "spark.memory.fraction": "0.75" WHat we see is a) In the SPark UI, 5G has been allocated to each executor, which makes sense

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

2018-08-02 Thread Nirav Patel

I tried following to explicitly specify partition columns in sql statement and also tried different cases (upper and lower) fro partition columns. insert overwrite table $tableName PARTITION(P1, P2) select A, B, C, P1, P2 from updateTable. Still getting: Caused by: org.apache.hadoop.hive.ql.meta

re: streaming, batch / spark 2.2.1

2018-08-02 Thread Peter Liu

Hello there, I'm new to spark streaming and have trouble to understand spark batch "composition" (google search keeps give me an older spark streaming concept). Would appreciate any help and clarifications. I'm using spark 2.2.1 for a streaming workload (see quoted code in (a) below). The general

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

2018-08-02 Thread Nirav Patel

Thanks Koert. I'll check that out when we can update to 2.3 Meanwhile, I am trying hive sql (INSERT OVERWRITE) statement to insert overwrite multiple partitions. (without loosing existing ones) It's giving me issues around partition columns. dataFrame.createOrReplaceTempView("updateTable") /

Re: Can we deploy python script on a spark cluster

2018-08-02 Thread amit kumar singh

Hi Lehak You can make a scala project with oozing class And one run class which will ship your python file to cluster Define oozie coordinator with spark action or shell action We are deploying pyspark based machine learning code Sent from my iPhone > On Aug 2, 2018, at 8:46 AM, Lehak D

Can we deploy python script on a spark cluster

2018-08-02 Thread Lehak Dharmani

We are trying to deploy python script on spark cluster . However as per documentations , it is not possible to deploy python applications on a cluster . Is there any alternative -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ ---

Re: Clearing usercache on EMR [pyspark]

Re: Spark on Kubernetes: Kubernetes killing executors because of overallocation of memory

Insert into dynamic partitioned hive/parquet table throws error - Partition spec contains non-partition columns

Re: [External Sender] re: streaming, batch / spark 2.2.1

Re: re: streaming, batch / spark 2.2.1

Re: [External Sender] re: streaming, batch / spark 2.2.1

Re: re: streaming, batch / spark 2.2.1

Spark on Kubernetes: Kubernetes killing executors because of overallocation of memory

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

re: streaming, batch / spark 2.2.1

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

Re: Can we deploy python script on a spark cluster

Can we deploy python script on a spark cluster

13 matches

Site Navigation

Mail list logo

Footer information