Re: freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Takeshi Yamamuro
Hi, AFAIK, the blocks of minibatch RDDs are checked every job finished, and older blocks automatically removed (See: https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L463 ). You can control this behaviour by StreamingContext#rem

Re: How to do dashboard reporting in spark

2017-01-19 Thread Jörn Franke
You can use zeppelin if you want to directly interact with Spark. For traditional tools you have the right ideas (any of them works depending on requirements) See also lambda architecture > On 20 Jan 2017, at 08:18, Gaurav1809 wrote: > > Hi All, > > > Once data is stored in data frames? Wha

How to do dashboard reporting in spark

2017-01-19 Thread Gaurav1809
Hi All, Once data is stored in data frames? What's next? where do we go from there? Do we store data in Hive or any RDBMS(Oracle, MYSql, Teradata)? How to do the dashboard reporting based on the data present in dataframes. If there is any BI tool available in Spark Ecosystem, Please suggest. Tha

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Palash Gupta
Hi, You need to add mode overwrite option to avoid this error. //P.Gupta Sent from Yahoo Mail on Android On Fri, 20 Jan, 2017 at 2:15 am, VND Tremblay, Paul wrote: I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist in Spark 1.6.   19:0

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Oshadha Gunawardena
On Fri, Jan 20, 2017 at 11:26 AM, Gavin Yue wrote: > PST or est ? > > On Jan 19, 2017, at 21:55, ayan guha wrote: > > Sure...we will wait :) :) > > Just kidding > > On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 com> wrote: > >> Get Outlook for Android >> --

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Gavin Yue
PST or est ? > On Jan 19, 2017, at 21:55, ayan guha wrote: > > Sure...we will wait :) :) > > Just kidding > >> On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 >> wrote: >> Get Outlook for Android >> Happiest Minds Disclaimer >> This message is for the sole use of the intended recipient(s) a

Re: Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread ayan guha
Sure...we will wait :) :) Just kidding On Fri, Jan 20, 2017 at 4:48 PM, Manohar753 wrote: > Get Outlook for Android > -- > Happiest Minds Disclaimer > > This message is for the sole use of the intended recipient(s) and may > contain confid

Will be in around 12:30pm due to some personal stuff

2017-01-19 Thread Manohar753
Get Outlook for Android Happiest Minds Disclaimer This message is for the sole use of the intended recipient(s) and may contain confidential, proprietary or legally privileged information. Any unauthorized review, use, disclosure or distri

Re: Spark Source Code Configuration

2017-01-19 Thread Deepu Raj
Thanks Kai, I am getting the message and its stuck when I run sbt any idea:- "Set current project to spark-parent (in build file:/home/cloudera/spark/)" Details attached. Regards, Deepu Raj +61 414 707 319 On Fri, 20 Jan 2017 10:27:16 +1100, Kai Jiang wrote: Hi Deepu, Hope this page can

Re: spark 2.02 error when writing to s3

2017-01-19 Thread Takeshi Yamamuro
Hi, Do you get the same exception also in v2.1.0? Anyway, I saw another guy reporting the same error, I think. https://www.mail-archive.com/user@spark.apache.org/msg60882.html // maropu On Fri, Jan 20, 2017 at 5:15 AM, VND Tremblay, Paul wrote: > I have come across a problem when writing CSV

Re: Spark streaming app that processes Kafka DStreams produces no output and no error

2017-01-19 Thread shyla deshpande
There was a issue connecting to Kafka, once that was fixed the spark app works. Hope this helps someone. Thanks On Mon, Jan 16, 2017 at 7:58 AM, shyla deshpande wrote: > Hello, > I checked the log file on the worker node and don't see any error there. > This is the first time I am asked to run

Non-linear (curved?) regression line

2017-01-19 Thread Ganesh
Has anyone worked on non-linear/curved regression lines with Apache Spark? This seems to be such a trivial issue but I have given up after experimenting for nearly two weeks. The plot line is as below and the raw data in the table at the end. I just can't get Spark ML to give decent predictio

Re: Executors - running out of memory

2017-01-19 Thread sanat kumar Patnaik
Please try and play with spark-defaults.conf for EMR. Dynamic allocation = true is there by default for EMR 4.4 and above. What is the EMR version you are using? http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#d0e20458 On Thu, Jan 19, 2017 at 5:02 PM, Venkata D wrote:

Re: Spark Source Code Configuration

2017-01-19 Thread Kai Jiang
Hi Deepu, Hope this page can give you some help. http://spark.apache.org/developer-tools.html Best, Kai On Thu, Jan 19, 2017, 14:51 Deepu Raj wrote: > Hi, > > Is there any article/Docs/support to set up Apache Spark source code on > Eclipse/InteliJ. > > I have tried setting up the source code,

Spark Source Code Configuration

2017-01-19 Thread Deepu Raj
Hi, Is there any article/Docs/support to set up Apache Spark source code on Eclipse/InteliJ. I have tried setting up the source code, by importing into Git & using maven. I am getting lot of compilation errors. #suggestions Regards, Deepu Raj +61 414 707 319 ---

Re: Executors - running out of memory

2017-01-19 Thread Venkata D
blondowski, How big is your JSON file. Is it possible to post the spark params or configurations here, maybe that might get to some idea about the issue. Thanks On Thu, Jan 19, 2017 at 4:21 PM, blondowski wrote: > Please bear with me..I'm fairly new to spark. Running pyspark 2.0.1 on AWS > EM

Re: How to save spark-ML model in Java?

2017-01-19 Thread Xiaomeng Wan
cv.fit is going to give you a CrossValidatorModel, if you want to extract the real model built. You need to do val cvModel = cv.fit(data) val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel] val model = plmodel.stages(2).asInstanceOf[whatever_model] then you can model.save O

Executors - running out of memory

2017-01-19 Thread blondowski
Please bear with me..I'm fairly new to spark. Running pyspark 2.0.1 on AWS EMR (6 node cluster with 475GB of RAM) We have a job that creates a dataframe from json files, then does some manipulation (adds columns) and then calls a UDF. The job fails on the UDF call with Container killed by YARN f

dataset aggregators with kryo encoder very slow

2017-01-19 Thread Koert Kuipers
we just converted a job from RDD to Dataset. the job does a single map-red phase using aggregators. we are seeing very bad performance for the Dataset version, about 10x slower. in the Dataset version we use kryo encoders for some of the aggregators. based on some basic profiling of spark in local

spark 2.02 error when writing to s3

2017-01-19 Thread VND Tremblay, Paul
I have come across a problem when writing CSV files to S3 in Spark 2.02. The problem does not exist in Spark 1.6. 19:09:20 Caused by: java.io.IOException: File already exists:s3://stx-apollo-pr-datascience-internal/revenue_model/part-r-00025-c48a0d52-9600-4495-913c-64ae6bf888bd.csv My code is

Re: How to save spark-ML model in Java?

2017-01-19 Thread Minudika Malshan
Hi, Thanks Rezaul and Asher Krim. The method suggested by Rezaul works fine for NaiveBayes but still fails for RandomForest and Multi-layer perceptron classifier. Everything properly is saved until this stage. CrossValidator cv = new CrossValidator() .setEstimator(pipeline) .setE

freeing up memory occupied by processed Stream Blocks

2017-01-19 Thread Andrew Milkowski
hello using spark 2.0.2 and while running sample streaming app with kinesis noticed (in admin ui Storage tab) "Stream Blocks" for each worker keeps climbing up then also (on same ui page) in Blocks section I see blocks such as below input-0-1484753367056 that are marked as Memory Serialized

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi
Thanks Sidney From: Sidney Feiner Sent: Thursday, January 19, 2017 9:52 AM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Every executor creates a directory with your submitted files and you can access every file's ab

Re: Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi
i wish someone added this to the documentation From: jeff saremi Sent: Thursday, January 19, 2017 9:56 AM To: Sidney Feiner Cc: user@spark.apache.org Subject: Re: Spark-submit: where do --files go? Thanks Sidney From: Sidney

Re: Spark-submit: where do --files go?

2017-01-19 Thread Sidney Feiner
Every executor creates a directory with your submitted files and you can access every file's absolute path them with the following: val fullFilePath = SparkFiles.get(fileName) On Jan 19, 2017 19:35, jeff saremi wrote: I'd like to know how -- From within Java/spark -- I can access the dependen

Spark-submit: where do --files go?

2017-01-19 Thread jeff saremi
I'd like to know how -- From within Java/spark -- I can access the dependent files which i deploy using "--files" option on the command line?

[SparkStreaming] SparkStreaming not allowing to do parallelize within a transform operation to generate a new RDD

2017-01-19 Thread Nipun Arora
Hi All, Can anyone suggest any way to create and "add to an RDD" as I describe below in a transform operation. I found that the error I observe, goes away if I comment out "ssc.checkpoint()". However, I need checkpointing in later stages. I would really appreciate any help. Thanks Nipun --

Re: Fw: Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Mich Talebzadeh
Thanks Kuan for insight. Much appreciated. Mich Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com *Disclaim

Re: Fw: Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Kuan Feng
Greetings, Dr Mich Talebzadeh, This is Kuan from IBM Platform team. Thank you for your interests in Platform Symphony and Spark product. I'm writing this mail to clarify the "EGO-YARN" in that blog post you were referring to. EGO is an enterprise resource orchestration component u

Re: Old version of Spark [v1.2.0]

2017-01-19 Thread Luciano Resende
Download page has been updated, hopefully will make things easier in the future http://spark.apache.org/downloads.html On Mon, Jan 16, 2017 at 1:52 AM, Jacek Laskowski wrote: > Hi Ayan, > > Although my first reaction was "Why would anyone ever want to download > older versions" after a brief th

Yarn resource management for Spark with IBM Platform Symphony

2017-01-19 Thread Mich Talebzadeh
Hi, IBM stating that when Yarn is integrated with IBM platform symphony, you have more control on your Spa

Re: how to dynamic partition dataframe

2017-01-19 Thread Michal Šenkýř
Hi, You can pass Seqs as varargs in Scala using this syntax: df.partitionBy(seq: _*) Michal On 18.1.2017 03:23, lk_spark wrote: hi,all: I want partition data by reading a config file who tells me how to partition current input data. DataFrameWriter have a method named with : partiti

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp
It’s since spark version 2.0.0, if you are using under the version, you can try the below code. result.write.format("csv").save(path) -- Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of org.apache.

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad
Hi, I tried the below code, as result.write.csv(home/Prasad/) It is not working, It says Error: value csv is not member of org.apache.spark.sql.DataFrameWriter. Regards Prasad On Thu, Jan 19, 2017 at 4:35 PM, smartzjp wrote: > Beacause the reduce number will be not one, so it will out put

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Chetan Khatri
Connect with Bangalore - Spark Meetup group. On Thu, Jan 19, 2017 at 3:07 PM, Deepak Sharma wrote: > Yes. > I will be there before 4 PM . > Whats your contact number ? > Thanks > Deepak > > On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu > wrote: > >> Are we meeting today?! >> >> On Jan 18, 20

Re: Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread smartzjp
Beacause the reduce number will be not one, so it will out put a fold on the HDFS, You can use “result.write.csv(foldPath)”. -- Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. Code :- scala> val result = s

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim
Thanks, Sean. I will explore online more. Regards, _ *Md. Rezaul Karim*, BSc, MSc PhD Researcher, INSIGHT Centre for Data Analytics National University of Ireland, Galway IDA Business Park, Dangan, Galway, Ireland Web: http://www.reza-analytics.eu/index.html

Re: "Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Sean Owen
It's a message from Hadoop libs, not Spark. It can be safely ignored. It's just saying you haven't installed the additional (non-Apache-licensed) native libs that can accelerate some operations. This is something you can easily have read more about online. On Thu, Jan 19, 2017 at 10:57 AM Md. Reza

"Unable to load native-hadoop library for your platform" while running Spark jobs

2017-01-19 Thread Md. Rezaul Karim
Hi All, I'm the getting the following WARNING while running Spark jobs in standalone mode: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Please note that I have configured the native path and the other ENV variables as follows: export JAVA_

Writing Spark SQL output in Local and HDFS path

2017-01-19 Thread Ravi Prasad
Hi, Can anyone please let us know how to write the output of the Spark SQL in Local and HDFS path using Scala code. *Code :-* scala> val result = sqlContext.sql("select empno , name from emp"); scala > result.show(); If I give the command result.show() then It will print the output in the c

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Deepak Sharma
Yes. I will be there before 4 PM . Whats your contact number ? Thanks Deepak On Thu, Jan 19, 2017 at 2:38 PM, Sirisha Cheruvu wrote: > Are we meeting today?! > > On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote: > >> Hi , >> >> Just thought of keeping my intention of working together with spark

Re: anyone from bangalore wants to work on spark projects along with me

2017-01-19 Thread Sirisha Cheruvu
Are we meeting today?! On Jan 18, 2017 8:32 AM, "Sirisha Cheruvu" wrote: > Hi , > > Just thought of keeping my intention of working together with spark > developers who are also from bangalore so that we can brainstorm > togetherand work out solutions on our projects? > > > what say? > > expecti

Re: is partitionBy of DataFrameWriter supported in 1.6.x?

2017-01-19 Thread Takeshi Yamamuro
Hi, In v1.6.0, it seems spark has supported `partitionBy` for JSON, text, ORC and avro. So, this is a bug of documents. Actually, this bug was fixed in v1.6.1 (See: https://github.com/apache/spark/commit/1005ee396f74dc4fcf127613b65e1abdb7f1934c ) Also, AFAIK, this document only describes datasour