date:20180801

Clearing usercache on EMR [pyspark]

2018-08-01 Thread Shuporno Choudhury

Hi everyone, I am running spark jobs on EMR (using pyspark). I noticed that after running jobs, the size of the usercache (basically the filecache folder) keeps on increasing (with directory names as 1,2,3,4,5,...). Directory location: */mnt/yarn/usercache/hadoop/**filecache/* Is there a way to

How to make Yarn dynamically allocate resources for Spark

2018-08-01 Thread Anton Puzanov

Hi everyone, have a cluster managed with Yarn and runs Spark jobs, the components were installed using Ambari (2.6.3.0-235). I have 6 hosts each with 6 cores. I use Fair scheduler I want Yarn to automatically add/remove executor cores, but no matter what I do it doesn't work Relevant Spark confi

How to make Yarn dynamically allocate resources for Spark

2018-08-01 Thread Anton Puzanov

Hi everyone, have a cluster managed with Yarn and runs Spark jobs, the components were installed using Ambari (2.6.3.0-235). I have 6 hosts each with 6 cores. I use Fair scheduler I want Yarn to automatically add/remove executor cores, but no matter what I do it doesn't work Relevant Spark confi

How to use window method with direct kafka streaming ?

2018-08-01 Thread fat.wei

Hi everyone, I have the following scenario , and I tried to use window method with direct kafka streaming. The program can run, but doesn't run right！ 1. The data is stored in kafka. 2. Every single item of the data has its primary key. 3. Every single item of the data will be split into multipe

Data quality measurement for streaming data with apache spark

2018-08-01 Thread Uttam

Hello, I have very general question about Apache Spark. I want to know if it is possible(and where to start, if possible) to implement a data quality measurement prototype for streaming data using Apache Spark. Let's say I want to work on Timeliness or Completeness as a data quality metrics, is si

Re: How to add a new source to exsting struct streaming application, like a kafka source

2018-08-01 Thread Robb Greathouse

How to unsubscribe? On Mon, Jul 30, 2018 at 3:13 AM 杨浩 wrote: > How to add a new source to exsting struct streaming application, like a > kafka source > -- Robb Greathouse Middleware BU 505-507-4906

Re: How to add a new source to exsting struct streaming application, like a kafka source

2018-08-01 Thread David Rosenstrauch

On 08/01/2018 12:36 PM, Robb Greathouse wrote: How to unsubscribe? List-Unsubscribe: - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

2018-08-01 Thread Nirav Patel

Hi Peay, Have you find better solution yet? I am having same issue. Following says it works with spark 2.1 onward but only when you use sqlContext and not Dataframe https://medium.com/@anuvrat/writing-into-dynamic-partitions-using-spark-2e2b818a007a Thanks, Nirav On Mon, Oct 2, 2017 at 4:37 AM,

Overwrite only specific partition with hive dynamic partitioning

2018-08-01 Thread Nirav Patel

Hi, I have a hive partition table created using sparkSession. I would like to insert/overwrite Dataframe data to specific set of partition without loosing any other partition. In each run I have to update Set of partitions not just one. e.g. I have dataframe with bid=1, bid=2, bid=3 in first time

RE: Split a row into multiple rows Java

2018-08-01 Thread nookala

Pivot seems to do the opposite of what I want, convert rows to columns. I was able to get this done in python, but would like to do this in Java idfNew = idf.rdd.flatMap((lambda row: [(row.Name, row.Id, row.Date, "0100",row.0100),(row.Name, row.Id, row.Date, "0200",row.0200),row.Name, row.Id, row

Re: Split a row into multiple rows Java

2018-08-01 Thread Anton Puzanov

you can always use array+explode, I don't know if its the most elegant/optimal solution (would be happy to hear from the experts) code example: //create data Dataset test= spark.createDataFrame(Arrays.asList(new InternalData("bob", "b1", 1,2,3), new InternalData("alive", "c1", 3,4,6),

Spark Memory Requirement

2018-08-01 Thread msbreuer

Many threads talk about memory requirements and most often answers are, to add more memory to spark. My understanding of spark is a scaleable anyltics engine, which is able to utilize assigned resources and to calculate the correct answer. So assigning core and memory may speedup an task. I am usi

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

2018-08-01 Thread Koert Kuipers

this works for dataframes with spark 2.3 by changing a global setting, and will be configurable per write in 2.4 see: https://issues.apache.org/jira/browse/SPARK-20236 https://issues.apache.org/jira/browse/SPARK-24860 On Wed, Aug 1, 2018 at 3:11 PM, Nirav Patel wrote: > Hi Peay, > > Have you fin

unsubscribe

2018-08-01 Thread Eco Super

Hi User, unsubscribe me

Clearing usercache on EMR [pyspark]

How to make Yarn dynamically allocate resources for Spark

How to make Yarn dynamically allocate resources for Spark

How to use window method with direct kafka streaming ?

Data quality measurement for streaming data with apache spark

Re: How to add a new source to exsting struct streaming application, like a kafka source

Re: How to add a new source to exsting struct streaming application, like a kafka source

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

Overwrite only specific partition with hive dynamic partitioning

RE: Split a row into multiple rows Java

Re: Split a row into multiple rows Java

Spark Memory Requirement

Re: Saving dataframes with partitionBy: append partitions, overwrite within each

unsubscribe

14 matches

Site Navigation

Mail list logo

Footer information