Re: Spark 1.4 release date

2015-06-12 Thread Guru Medasani
Here is a spark 1.4 release blog by data bricks. https://databricks.com/blog/2015/06/11/announcing-apache-spark-1-4.html <https://databricks.com/blog/2015/06/11/announcing-apache-spark-1-4.html> Guru Medasani gdm...@gmail.com > On Jun 12, 2015, at 7:08 AM, ayan guha wrote: >

Re: SparkR 1.4.0: read.df() function fails

2015-06-16 Thread Guru Medasani
r. Pointing to the right hdfs path should be able to help here. > Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does > not exist: hdfs://smalldata13.hdp:8020/home/esten/ami/usaf.json > Guru Medasani gdm...@gmail.com > On Jun 16, 2015, at 10:39 AM, Shivaram Ve

Re: Is there programmatic way running Spark job on Yarn cluster without using spark-submit script ?

2015-06-17 Thread Guru Medasani
ly-not-via/td-p/24721 <http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/What-dependencies-to-submit-Spark-jobs-programmatically-not-via/td-p/24721> Guru Medasani gdm...@gmail.com > On Jun 17, 2015, at 6:01 PM, Elkhan Dadashov wrote: > > This is not independen

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Guru Medasani
thread? Guru Medasani gdm...@gmail.com > On Jul 7, 2015, at 10:42 PM, Ashish Dutt wrote: > > Hi, > I have CDH 5.4 installed on a linux server. It has 1 cluster in which spark > is deployed as a history server. > I am trying to connect my laptop to the spark history server.

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Guru Medasani
server where Spark history server is running. Guru Medasani gdm...@gmail.com > On Jul 8, 2015, at 12:01 AM, Ashish Dutt wrote: > > Hello Guru, > Thank you for your quick response. > This is what i get when I try executing spark-shell :port number > > C:\spark-1

Re: Building a REST Service with Spark back-end

2016-03-02 Thread Guru Medasani
is roughly 3.14004\nNUM_SAMPLES: Int = 100000\ncount: Int = 78501'}, u'execution_count': 1, u'status': u'ok'}, u'state': u'available'} Guru Medasani gdm...@gmail.com > On Mar 2, 2016, at 7:47 AM, Todd

Re: Building a REST Service with Spark back-end

2016-03-02 Thread Guru Medasani
Hi Yanlin, This is a fairly new effort and is not officially released/supported by Cloudera yet. I believe those numbers will be out once it is released. Guru Medasani gdm...@gmail.com > On Mar 2, 2016, at 10:40 AM, yanlin wang wrote: > > Did any one use Livy in real world high co

Re: MLLib + Streaming

2016-03-05 Thread Guru Medasani
monitored to see how it is performing. Hope this helps in understanding offline learning vs. online learning and which algorithms you can choose for online learning in MLlib. Guru Medasani gdm...@gmail.com > On Mar 5, 2016, at 7:37 PM, Lan Jiang wrote: > > Hi, there > > I h

Re: AVRO vs Parquet

2016-03-09 Thread Guru Medasani
nge the column metadata in the metastore and not have that be reflected in the Avro schema as well. Guru Medasani gdm...@gmail.com > On Mar 4, 2016, at 7:36 AM, Paul Leclercq wrote: > > > > Nice article about Parquet with Avro : > https://dzone.com/articles/understandi

Re: AVRO vs Parquet

2016-03-10 Thread Guru Medasani
Thanks Michael for clarifying this. My response is inline. Guru Medasani gdm...@gmail.com > On Mar 10, 2016, at 12:38 PM, Michael Armbrust wrote: > > A few clarifications: > > 1) High memory and cpu usage. This is because Parquet files can't be streamed > into as

Re: Error in load hbase on spark

2015-10-09 Thread Guru Medasani
a.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/> HBase Jira link: https://issues.apache.org/jira/browse/HBASE-13992 <https://issues.apache.org/jira/browse/HBASE-13992> Guru Medasani gdm...@gmail.com > On Oct 8, 2015, at 9:29 PM, Roy Wang wrote: &

Re: Topology.py -- Cannot run on Spark Gateway on Cloudera 5.4.4.

2015-08-03 Thread Guru Medasani
Hi Upen, Did you deploy the client configs after assigning the gateway roles? You should be able to do this from Cloudera Manager. Can you try this and let us know what you see when you run spark-shell? Guru Medasani gdm...@gmail.com > On Aug 3, 2015, at 9:10 PM, Upen N wrote: >

Re: Spark-Submit error

2015-08-03 Thread Guru Medasani
Hi Satish, Can you add more error or log info to the email? Guru Medasani gdm...@gmail.com > On Jul 31, 2015, at 1:06 AM, satish chandra j > wrote: > > HI, > I have submitted a Spark Job with options jars,class,master as local but i am > getting an error as below >

Re: Spark-Submit error

2015-08-03 Thread Guru Medasani
Thanks Satish. I only see the INFO messages and don’t see any error messages in the output you pasted. Can you paste the log with the error messages? Guru Medasani gdm...@gmail.com > On Aug 3, 2015, at 11:12 PM, satish chandra j > wrote: > > Hi Guru, > I am executing t

Re: Spark + Jupyter (IPython Notebook)

2015-08-18 Thread Guru Medasani
-spark/ <http://blog.cloudera.com/blog/2014/08/how-to-use-ipython-notebook-with-apache-spark/> Guru Medasani gdm...@gmail.com > On Aug 18, 2015, at 8:35 AM, Jerry Lam wrote: > > Hi spark users and developers, > > Did anyone have IPython Notebook (Jupyter) deployed in

Re: Spark + Jupyter (IPython Notebook)

2015-08-18 Thread Guru Medasani
t/jupyter-scala> Guru Medasani gdm...@gmail.com > On Aug 18, 2015, at 12:29 PM, Jerry Lam wrote: > > Hi Guru, > > Thanks! Great to hear that someone tried it in production. How do you like it > so far? > > Best Regards, > > Jerry > > > On Tue, A

Re: Problem while loading saved data

2015-09-02 Thread Guru Medasani
mary file found under > file:/home/ubuntu/ipython/people.parquet2. Guru Medasani gdm...@gmail.com > On Sep 2, 2015, at 8:25 PM, Amila De Silva wrote: > > Hi All, > > I have a two node spark cluster, to which I'm connecting using IPython > notebook. > To see h

Re: Change protobuf version or any other third party library version in Spark application

2015-09-15 Thread Guru Medasani
or executors. The old option was deprecated, and aliased to the new one (spark.executor.userClassPathFirst). The existing "child-first" class loader also had to be fixed. It didn't handle resources, and it was also doing some things that ended up causing JVM errors depending on h

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
Hi Anthony, What is the setting of the total amount of memory in MB that can be allocated to containers on your NodeManagers? yarn.nodemanager.resource.memory-mb Can you check this above configuration in yarn-site.xml used by the node manager process? -Guru Medasani From: Sandy Ryza Date

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-27 Thread Guru Medasani
Can you attach the logs where this is failing? From: Sven Krasser Date: Tuesday, January 27, 2015 at 4:50 PM To: Guru Medasani Cc: Sandy Ryza , Antony Mayi , "user@spark.apache.org" Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded Since it's an executor

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-01-28 Thread Guru Medasani
Hi Antony, Did you get pass this error by repartitioning your job with smaller tasks as Sven Krasser pointed out? From: Antony Mayi Reply-To: Antony Mayi Date: Tuesday, January 27, 2015 at 5:24 PM To: Guru Medasani , Sven Krasser Cc: Sandy Ryza , "user@spark.apache.org" Su

Re: Nightly builds/releases?

2015-05-04 Thread Guru Medasani
I see a Jira for this one, but unresolved. https://issues.apache.org/jira/browse/SPARK-1517 > On May 4, 2015, at 10:25 PM, Ankur Chauhan wrote: > > Hi, > > Does anyone know if spark has any nightly builds or equivalent that provides > bi

Re: Spark-submit not running

2014-08-28 Thread Guru Medasani
Can you copy the exact spark-submit command that you are running? You should be able to run it locally without installing hadoop. Here is an example on how to run the job locally. # Run application locally on 8 cores ./bin/spark-submit \ --class org.apache.spark.examples.SparkPi \ --master

Re: Spark-submit not running

2014-08-28 Thread Guru Medasani
e moment there is still a dependency on Hadoop even > when not using it. See https://issues.apache.org/jira/browse/SPARK-2356 > > > On Thu, Aug 28, 2014 at 2:14 PM, Guru Medasani wrote: > Can you copy the exact spark-submit command that you are running? > > You should be a

Re: Programatically running of the Spark Jobs.

2014-09-04 Thread Guru Medasani
I am able to run Spark jobs and Spark Streaming jobs successfully via YARN on a CDH cluster. When you mean YARN isn’t quite there yet, you mean to submit the jobs programmatically? or just in general? On Sep 4, 2014, at 1:45 AM, Matt Chu wrote: > https://github.com/spark-jobserver/spark-jo

RE: Spark Installation Maven PermGen OutOfMemoryException

2014-12-23 Thread Guru Medasani
Hi Vladimir, >From the link Sean posted, if you use Java 8 there is this following note. Note: For Java 8 and above this step is not required. So if you have no problems using Java 8, give it a shot. Best Regards,Guru Medasani > From: so...@cloudera.com > Date: Tue, 23 Dec 2014

RE: Spark Installation Maven PermGen OutOfMemoryException

2014-12-23 Thread Guru Medasani
Thanks for the clarification Sean. Best Regards,Guru Medasani > From: so...@cloudera.com > Date: Tue, 23 Dec 2014 15:39:59 + > Subject: Re: Spark Installation Maven PermGen OutOfMemoryException > To: gdm...@outlook.com > CC: protsenk...@gmail.com; user@spark.apache.or