date:20170119

Re: freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Takeshi Yamamuro

Hi, AFAIK, the blocks of minibatch RDDs are checked every job finished, and older blocks automatically removed (See: https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L463 ). You can control this behaviour by StreamingContext#rem

Re: How to do dashboard reporting in spark

2017-01-19 Thread Jörn Franke

You can use zeppelin if you want to directly interact with Spark. For traditional tools you have the right ideas (any of them works depending on requirements) See also lambda architecture > On 20 Jan 2017, at 08:18, Gaurav1809 wrote: > > Hi All, > > > Once data is stored in data frames? Wha

How to do dashboard reporting in spark

2017-01-19 Thread Gaurav1809

Hi All, Once data is stored in data frames? What's next? where do we go from there? Do we store data in Hive or any RDBMS(Oracle, MYSql, Teradata)? How to do the dashboard reporting based on the data present in dataframes. If there is any BI tool available in Spark Ecosystem, Please suggest. Tha

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Palash Gupta

Hi, You need to add mode overwrite option to avoid this error. //P.Gupta Sent from Yahoo Mail on Android On Fri, 20 Jan, 2017 at 2:15 am, VND Tremblay, Paul wrote: I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist in Spark 1.6. 19:0

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Oshadha Gunawardena

On Fri, Jan 20, 2017 at 11:26 AM, Gavin Yue wrote: > PST or est ? > > On Jan 19, 2017, at 21:55, ayan guha wrote: > > Sure...we will wait :) :) > > Just kidding > > On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 com> wrote: > >> Get Outlook for Android >> --

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Gavin Yue

PST or est ? > On Jan 19, 2017, at 21:55, ayan guha wrote: > > Sure...we will wait :) :) > > Just kidding > >> On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 >> wrote: >> Get Outlook for Android >> Happiest Minds Disclaimer >> This message is for the sole use of the intended recipient(s) a

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread ayan guha

Sure...we will wait :) :) Just kidding On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 wrote: > Get Outlook for Android > -- > Happiest Minds Disclaimer > > This message is for the sole use of the intended recipient(s) and may > contain confid

Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Manohar753

Get Outlook for Android Happiest Minds Disclaimer This message is for the sole use of the intended recipient(s) and may contain confidential, proprietary or legally privileged information. Any unauthorized review, use, disclosure or distri

Re: Spark Source Code Configuration

2017-01-19 Thread Deepu Raj

Thanks Kai, I am getting the message and its stuck when I run sbt any idea:- "Set current project to spark-parent (in build file:/home/cloudera/spark/)" Details attached. Regards, Deepu Raj +61 414 707 319 On Fri, 20 Jan 2017 10:27:16 +1100, Kai Jiang wrote: Hi Deepu, Hope this page can

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Takeshi Yamamuro

Hi, Do you get the same exception also in v2.1.0? Anyway, I saw another guy reporting the same error, I think. https://www.mail-archive.com/user@spark.apache.org/msg60882.html // maropu On Fri, Jan 20, 2017 at 5:15 AM, VND Tremblay, Paul wrote: > I have come across a problem when writing CSV

Re: Spark streaming app that processes Kafka DStreams produces no output and no error

2017-01-19 Thread shyla deshpande

There was a issue connecting to Kafka, once that was fixed the spark app works. Hope this helps someone. Thanks On Mon, Jan 16, 2017 at 7:58 AM, shyla deshpande wrote: > Hello, > I checked the log file on the worker node and don't see any error there. > This is the first time I am asked to run

Non-linear (curved?) regression line

2017-01-19 Thread Ganesh

Has anyone worked on non-linear/curved regression lines with Apache Spark? This seems to be such a trivial issue but I have given up after experimenting for nearly two weeks. The plot line is as below and the raw data in the table at the end. I just can't get Spark ML to give decent predictio

Re: Executors - running out of memory

2017-01-19 Thread sanat kumar Patnaik

Please try and play with spark-defaults.conf for EMR. Dynamic allocation = true is there by default for EMR 4.4 and above. What is the EMR version you are using? http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#d0e20458 On Thu, Jan 19, 2017 at 5:02 PM, Venkata D wrote:

Re: Spark Source Code Configuration

2017-01-19 Thread Kai Jiang

Hi Deepu, Hope this page can give you some help. http://spark.apache.org/developer-tools.html Best, Kai On Thu, Jan 19, 2017, 14:51 Deepu Raj wrote: > Hi, > > Is there any article/Docs/support to set up Apache Spark source code on > Eclipse/InteliJ. > > I have tried setting up the source code,

Spark Source Code Configuration

2017-01-19 Thread Deepu Raj

Hi, Is there any article/Docs/support to set up Apache Spark source code on Eclipse/InteliJ. I have tried setting up the source code, by importing into Git & using maven. I am getting lot of compilation errors. #suggestions Regards, Deepu Raj +61 414 707 319 ---

Re: Executors - running out of memory

2017-01-19 Thread Venkata D

blondowski, How big is your JSON file. Is it possible to post the spark params or configurations here, maybe that might get to some idea about the issue. Thanks On Thu, Jan 19, 2017 at 4:21 PM, blondowski wrote: > Please bear with me..I'm fairly new to spark. Running pyspark 2.0.1 on AWS > EM

Re: How to save spark-ML model in Java?

2017-01-19 Thread Xiaomeng Wan

cv.fit is going to give you a CrossValidatorModel, if you want to extract the real model built. You need to do val cvModel = cv.fit(data) val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel] val model = plmodel.stages(2).asInstanceOf[whatever_model] then you can model.save O

Executors - running out of memory

2017-01-19 Thread blondowski

Please bear with me..I'm fairly new to spark. Running pyspark 2.0.1 on AWS EMR (6 node cluster with 475GB of RAM) We have a job that creates a dataframe from json files, then does some manipulation (adds columns) and then calls a UDF. The job fails on the UDF call with Container killed by YARN f

dataset aggregators with kryo encoder very slow

2017-01-19 Thread Koert Kuipers

we just converted a job from RDD to Dataset. the job does a single map-red phase using aggregators. we are seeing very bad performance for the Dataset version, about 10x slower. in the Dataset version we use kryo encoders for some of the aggregators. based on some basic profiling of spark in local

spark 2.02 error when writing to s3

2017-01-19 Thread VND Tremblay, Paul

I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist in Spark 1.6. 19:09:20 Caused by: java.io.IOException: File already exists:s3://stx-apollo-pr-datascience-internal/revenue_model/part-r-00025-c48a0d52-9600-4495-913c-64ae6bf888bd.csv My code is

Re: How to save spark-ML model in Java?

2017-01-19 Thread Minudika Malshan

Hi, Thanks Rezaul and Asher Krim. The method suggested by Rezaul works fine for NaiveBayes but still fails for RandomForest and Multi-layer perceptron classifier. Everything properly is saved until this stage. CrossValidator cv = new CrossValidator() .setEstimator(pipeline) .setE

freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Andrew Milkowski

hello using spark 2.0.2 and while running sample streaming app with kinesis noticed (in admin ui Storage tab) "Stream Blocks" for each worker keeps climbing up then also (on same ui page) in Blocks section I see blocks such as below input-0-1484753367056 that are marked as Memory Serialized

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

Thanks Sidney From: Sidney Feiner Sent: Thursday, January 19, 2017 9:52 AM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Every executor creates a directory with your submitted files and you can access every file's ab

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

i wish someone added this to the documentation From: jeff saremi Sent: Thursday, January 19, 2017 9:56 AM To: Sidney Feiner Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Thanks Sidney From: Sidney

Re: Spark-submit: where do --files go?

2017-01-19 Thread Sidney Feiner

Every executor creates a directory with your submitted files and you can access every file's absolute path them with the following: val fullFilePath = SparkFiles.get(fileName) On Jan 19, 2017 19:35, jeff saremi wrote: I'd like to know how -- From within Java/spark -- I can access the dependen

Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi

I'd like to know how -- From within Java/spark -- I can access the dependent files which i deploy using "--files" option on the command line?

[SparkStreaming] SparkStreaming not allowing to do parallelize within a transform operation to generate a new RDD

2017-01-19 Thread Nipun Arora

Hi All, Can anyone suggest any way to create and "add to an RDD" as I describe below in a transform operation. I found that the error I observe, goes away if I comment out "ssc.checkpoint()". However, I need checkpointing in later stages. I would really appreciate any help. Thanks Nipun --

Re: Fw: Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Mich Talebzadeh

Thanks Kuan for insight. Much appreciated. Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaim

Re: Fw: Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Kuan Feng

Greetings, Dr Mich Talebzadeh, This is Kuan from IBM Platform team. Thank you for your interests in Platform Symphony and Spark product. I'm writing this mail to clarify the "EGO-YARN" in that blog post you were referring to. EGO is an enterprise resource orchestration component u

Re: Old version of Spark [v1.2.0]

2017-01-19 Thread Luciano Resende

Download page has been updated, hopefully will make things easier in the future http://spark.apache.org/downloads.html On Mon, Jan 16, 2017 at 1:52 AM, Jacek Laskowski wrote: > Hi Ayan, > > Although my first reaction was "Why would anyone ever want to download > older versions" after a brief th

Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Mich Talebzadeh

Hi, IBM stating that when Yarn is integrated with IBM platform symphony, you have more control on your Spa

Re: how to dynamic partition dataframe

2017-01-19 Thread Michal Šenkýř

Hi, You can pass Seqs as varargs in Scala using this syntax: df.partitionBy(seq: _*) Michal On 18.1.2017 03:23, lk_spark wrote: hi,all: I want partition data by reading a config file who tells me how to partition current input data. DataFrameWriter have a method named with : partiti

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp

It’s since spark version 2.0.0, if you are using under the version, you can try the below code. result.write.format("csv").save(path) -- Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of org.apache.

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad

Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of org.apache.spark.sql.DataFrameWriter. Regards Prasad On Thu, Jan 19, 2017 at 4:35 PM, smartzjp wrote: > Beacause the reduce number will be not one, so it will out put

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Chetan Khatri

Connect with Bangalore - Spark Meetup group. On Thu, Jan 19, 2017 at 3:07 PM, Deepak Sharma wrote: > Yes. > I will be there before 4 PM . > Whats your contact number ? > Thanks > Deepak > > On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu > wrote: > >> Are we meeting today?! >> >> On Jan 18, 20

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp

Beacause the reduce number will be not one, so it will out put a fold on the HDFS, You can use “result.write.csv(foldPath)”. -- Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. Code :- scala> val result = s

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim

Thanks, Sean. I will explore online more. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Sean Owen

It's a message from Hadoop libs, not Spark. It can be safely ignored. It's just saying you haven't installed the additional (non-Apache-licensed) native libs that can accelerate some operations. This is something you can easily have read more about online. On Thu, Jan 19, 2017 at 10:57 AM Md. Reza

"Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim

Hi All, I'm the getting the following WARNING while running Spark jobs in standalone mode: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Please note that I have configured the native path and the other ENV variables as follows: export JAVA_

Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad

Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. *Code :-* scala> val result = sqlContext.sql("select empno , name from emp"); scala > result.show(); If I give the command result.show() then It will print the output in the c

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Deepak Sharma

Yes. I will be there before 4 PM . Whats your contact number ? Thanks Deepak On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu wrote: > Are we meeting today?! > > On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote: > >> Hi , >> >> Just thought of keeping my intention of working together with spark

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Sirisha Cheruvu

Are we meeting today?! On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote: > Hi , > > Just thought of keeping my intention of working together with spark > developers who are also from bangalore so that we can brainstorm > togetherand work out solutions on our projects? > > > what say? > > expecti

Re: is partitionBy of DataFrameWriter supported in 1.6.x?

2017-01-19 Thread Takeshi Yamamuro

Hi, In v1.6.0, it seems spark has supported `partitionBy` for JSON, text, ORC and avro. So, this is a bug of documents. Actually, this bug was fixed in v1.6.1 (See: https://github.com/apache/spark/commit/1005ee396f74dc4fcf127613b65e1abdb7f1934c ) Also, AFAIK, this document only describes datasour

43 matches

Mail list logo