Re: Spark 1.3.0: how to let Spark history load old records?

2015-06-02 Thread Otis Gospodnetic
I think Spark doesn't keep historical metrics. You can use something like SPM for that - http://blog.sematext.com/2014/01/30/announcement-apache-storm-monitoring-in-spm/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.co

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Otis Gospodnetic
I think you can use SPM - http://sematext.com/spm - it will give you all Spark and all Kafka metrics, including offsets broken down by topic, etc. out of the box. I see more and more people using it to monitor various components in data processing pipelines, a la http://blog.sematext.com/2015/04/2

Re: RE: ElasticSearch for Spark times out

2015-04-22 Thread Otis Gospodnetic
Hi, If you get ES response back in 1-5 seconds that's pretty slow. Are these ES aggregation queries? Costin may be right about GC possibly causing timeouts. SPM can give you all Spark and all key Elasticsearch metrics, including various JVM metrics. If the problem is

Re: Spark @ EC2: Futures timed out & Ask timed out

2015-03-17 Thread Otis Gospodnetic
gt; Best Regards > > On Tue, Mar 17, 2015 at 3:26 AM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > >> Hi, >> >> I've been trying to run a simple SparkWordCount app on EC2, but it looks >> like my apps are not succeeding/completing. I'm

Spark @ EC2: Futures timed out & Ask timed out

2015-03-16 Thread Otis Gospodnetic
Hi, I've been trying to run a simple SparkWordCount app on EC2, but it looks like my apps are not succeeding/completing. I'm suspecting some sort of communication issue. I used the SparkWordCount app from http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/ Digg

Re: throughput in the web console?

2015-02-25 Thread Otis Gospodnetic
Hi Josh, SPM will show you this info. I see you use Kafka, too, whose numerous metrics you can also see in SPM side by side with your Spark metrics. Sounds like trends is what you are after, so I hope this helps. See http://sematext.com/spm Otis > On Feb 24, 2015, at 11:59, Josh J wrote:

Spark job for demoing Spark metrics monitoring?

2015-01-21 Thread Otis Gospodnetic
Hi, I'll be showing our Spark monitoring at the upcoming Spark Summit in NYC. I'd like to run some/any Spark job that really exercises Spark and makes it emit all its various metrics (so the metrics charts are full of data and not bla

Re: monitoring for spark standalone

2014-12-11 Thread Otis Gospodnetic
Hi Judy, SPM monitors Spark. Here are some screenshots: http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Mon, Dec 8, 2014 at 2:35 AM, Judy Nash wro

Re: Monitoring Spark

2014-12-02 Thread Otis Gospodnetic
Hi Isca, I think SPM can do that for you: http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Tue, Dec 2, 2014 at 11:57 PM, Isca Harmatz wrote: > hell

[ANN] Spark resources searchable

2014-11-04 Thread Otis Gospodnetic
Hi everyone, We've recently added indexing of all Spark resources to http://search-hadoop.com/spark . Everything is nicely searchable: * user & dev mailing lists * JIRA issues * web site * wiki * source code * javadoc. Maybe it's worth adding to http://spark.apache.org/community.html ? Enjoy!

Re: Measuring Performance in Spark

2014-10-31 Thread Otis Gospodnetic
Hi Mahsa, Use SPM . See http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ . Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On Fri, Oct 31, 2014 at 1:00 PM, mahsa wrote: >

Re: Spark Monitoring with Ganglia

2014-10-08 Thread Otis Gospodnetic
Hi, If using Ganglia is not an absolute requirement, check out SPM for Spark -- http://blog.sematext.com/2014/10/07/apache-spark-monitoring/ It monitors all Spark metrics (i.e. you don't need to figure out what you need to monitor, how to get it, how to graph it, etc.)

Re: Larger heap leads to perf degradation due to GC

2014-10-06 Thread Otis Gospodnetic
Hi, The other option to consider is using G1 GC, which should behave better with large heaps. But pointers are not compressed in heaps > 32 GB in size, so you may be better off staying under 32 GB. Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearc

Re: JMXSink for YARN deployment

2014-09-13 Thread Otis Gospodnetic
Hi, Jerry said "I'm guessing", so maybe the thing to try is to check if his guess is correct. What about running sudo lsof | grep metrics.properties ? I imagine you should be able to see it if the file was found and read. If Jerry is right, then I think you will NOT see it. Next, how about try

Deployment model popularity - Standard vs. YARN vs. Mesos vs. SIMR

2014-09-07 Thread Otis Gospodnetic
Hi, I'm trying to determine which Spark deployment models are the most popular - Standalone, YARN, Mesos, or SIMR. Anyone knows? I thought I'm use search-hadoop.com to help me figure this out and this is what I found: 1) Standalone http://search-hadoop.com/?q=standalone&fc_project=Spark&fc_typ