Re: Dynamic metric names

2019-05-07 Thread Roberto Coluccio
It would be a dream to have an easy-to-use dynamic metric system AND a reliable counting system (accumulator-like) in Spark... Thanks Roberto On Tue, May 7, 2019 at 3:54 AM Saisai Shao wrote: > I think the main reason why that was not merged is that Spark itself > doesn't have such requirement,

userClassPathFirst=true prevents SparkContext to be initialized

2017-01-30 Thread Roberto Coluccio
Hello folks, I'm trying to work around an issue with some dependencies by trying to specify at spark-submit time that I want my (user) classpath to be resolved and taken into account first (against the jars received through the System Classpath, which is /data/cloudera/parcels/CDH/jars/). In orde

Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-23 Thread Roberto Coluccio
Any chance anyone gave a look at this? Thanks! On Wed, Feb 10, 2016 at 10:46 AM, Roberto Coluccio < roberto.coluc...@gmail.com> wrote: > Thanks Shixiong! > > I'm attaching the thread dumps (I printed the Spark UI after expanding all > the elements, hope that's fi

Re: [Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-09 Thread Roberto Coluccio
don't use Kinesis and open any Receivers. Thank you! Roberto On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio wrote: > Hi, > > I'm struggling around an issue ever since I tried to upgrade my Spark > Streaming solution from 1.4.1 to 1.5+. > > I have a Spark Streaming app

[Spark 1.5+] ReceiverTracker seems not to stop Kinesis receivers

2016-02-02 Thread Roberto Coluccio
Hi, I'm struggling around an issue ever since I tried to upgrade my Spark Streaming solution from 1.4.1 to 1.5+. I have a Spark Streaming app which creates 3 ReceiverInputDStreams leveraging KinesisUtils.createStream API. I used to leverage a timeout to terminate my app (StreamingContext.awaitTe

Spark 1.5.2 streaming driver in YARN cluster mode on Hadoop 2.6 (on EMR 4.2) restarts after stop

2016-01-14 Thread Roberto Coluccio
Hi there, I'm facing a weird issue when upgrading from Spark 1.4.1 streaming driver on EMR 3.9 (hence Hadoop 2.4.0) to Spark 1.5.2 on EMR 4.2 (hence Hadoop 2.6.0). Basically, the very same driver which used to terminate after a timeout as expected, now does not. In particular, as long as the driv

Spark Streaming - print accumulators value every period as logs

2015-12-24 Thread Roberto Coluccio
Hello, I have a batch and a streaming driver using same functions (Scala). I use accumulators (passed to functions constructors) to count stuff. In the batch driver, doing so in the right point of the pipeline, I'm able to retrieve the accumulator value and print it as log4j log. In the streamin

Re: Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-11 Thread Roberto Coluccio
tor application logs (in an automated fashion) for long-running processes like streaming driver and if are there out-of-the-box solutions. Thanks, Roberto On Thu, Dec 10, 2015 at 3:06 PM, Steve Loughran wrote: > > > On 10 Dec 2015, at 14:52, Roberto Coluccio > wrote: > >

Spark on EMR: out-of-the-box solution for real-time application logs monitoring?

2015-12-10 Thread Roberto Coluccio
Hello, I'm investigating on a solution to real-time monitor Spark logs produced by my EMR cluster in order to collect statistics and trigger alarms. Being on EMR, I found the CloudWatch Logs + Lambda pretty straightforward and, since I'm on AWS, those service are pretty well integrated together..b

Re: Unable to catch SparkContext methods exceptions

2015-08-24 Thread Roberto Coluccio
e exception there. > > Best, > Burak > > On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio < > roberto.coluc...@gmail.com> wrote: > >> Hello folks, >> >> I'm experiencing an unexpected behaviour, that suggests me thinking about >> my missing

Unable to catch SparkContext methods exceptions

2015-08-24 Thread Roberto Coluccio
Hello folks, I'm experiencing an unexpected behaviour, that suggests me thinking about my missing notions on how Spark works. Let's say I have a Spark driver that invokes a function like: - in myDriver - val sparkContext = new SparkContext(mySparkConf) val inputPath = "file://home/myUser

Fwd: [Spark + Hive + EMR + S3] Issue when reading from Hive external table backed on S3 with large amount of small files

2015-08-07 Thread Roberto Coluccio
Please community, I'd really appreciate your opinion on this topic. Best regards, Roberto -- Forwarded message -- From: Roberto Coluccio Date: Sat, Jul 25, 2015 at 6:28 PM Subject: [Spark + Hive + EMR + S3] Issue when reading from Hive external table backed on S3 with

[Spark + Hive + EMR + S3] Issue when reading from Hive external table backed on S3 with large amount of small files

2015-07-25 Thread Roberto Coluccio
Hello Spark community, I currently have a Spark 1.3.1 batch driver, deployed in YARN-cluster mode on an EMR cluster (AMI 3.7.0) that reads input data through an HiveContext, in particular SELECTing data from an EXTERNAL TABLE backed on S3. Such table has dynamic partitions and contains *hundreds o

Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread Roberto Coluccio
Hello community, I'm currently using Spark 1.3.1 with Hive support for outputting processed data on an external Hive table backed on S3. I'm using a manual specification of the delimiter, but I'd want to know if is there any "clean" way to write in CSV format: *val* sparkConf = *new* SparkConf()

Re: Spark 1.4 RDD to DF fails with toDF()

2015-06-26 Thread Roberto Coluccio
I got a similar issue. Might your as well be related to this https://issues.apache.org/jira/browse/SPARK-8368 ? On Fri, Jun 26, 2015 at 2:00 PM, Akhil Das wrote: > Those provided spark libraries are compatible with scala 2.11? > > Thanks > Best Regards > > On Fri, Jun 26, 2015 at 4:48 PM, Srikan

Re: java.lang.OutOfMemoryError: PermGen space

2015-06-25 Thread Roberto Coluccio
this. It wouldn't work with anything > less than 256m for a simple piece of code. > 1.3.1 used to work with default(64m I think) > > Srikanth > > On Wed, Jun 24, 2015 at 12:47 PM, Roberto Coluccio < > roberto.coluc...@gmail.com> wrote: > >> Did you try to pass i

Re: java.lang.OutOfMemoryError: PermGen space

2015-06-24 Thread Roberto Coluccio
Did you try to pass it with --driver-java-options -XX:MaxPermSize=256m as spark-shell input argument? Roberto On Wed, Jun 24, 2015 at 5:57 PM, stati wrote: > Hello, > > I moved from 1.3.1 to 1.4.0 and started receiving > "java.lang.OutOfMemoryError: PermGen space" when I use spark-shell. >

Re: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Roberto Coluccio
I confirm, Christopher was very kind helping me out here. The solution presented in the linked doc worked perfectly. IMO it should be linked in the official Spark documentation. Thanks again, Roberto > On 20 Jun 2015, at 19:25, Bozeman, Christopher wrote: > > We worked it out. There was m

[Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-10 Thread Roberto Coluccio
Hi! I'm struggling with an issue with Spark 1.3.1 running on YARN, running on an AWS EMR cluster. Such cluster is based on AMI 3.7.0 (hence Amazon Linux 2015.03, Hive 0.13 already installed and configured on the cluster, Hadoop 2.4, etc...). I make use of the AWS emr-bootstrap-action "*install-spa

Re: Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
Hey Cheng, thank you so much for your suggestion, the problem was actually a column/field called "timestamp" in one of the case classes!! Once I changed its name everything worked out fine again. Let me say it was kinda frustrating ... Roberto On Wed, Mar 18, 2015 at 4:07 PM, Robert

Re: Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
0, it depends on the > actual contents of your query. > > Yin had opened a PR for this, although not merged yet, it should be a > valid fix https://github.com/apache/spark/pull/5078 > > This fix will be included in 1.3.1. > > Cheng > > On 3/18/15 10:04 PM, Roberto Coluc

Re: Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
an 22) String fields. Hope the situation is a bit more clear. Thanks anyone who will help me out here. Roberto On Wed, Mar 18, 2015 at 12:09 PM, Cheng Lian wrote: > Would you mind to provide the query? If it's confidential, could you > please help constructing a query that reprodu

Spark SQL weird exception after upgrading from 1.1.1 to 1.2.x

2015-03-18 Thread Roberto Coluccio
Hi everybody, When trying to upgrade from Spark 1.1.1 to Spark 1.2.x (tried both 1.2.0 and 1.2.1) I encounter a weird error never occurred before about which I'd kindly ask for any possible help. In particular, all my Spark SQL queries fail with the following exception: java.lang.RuntimeExcepti

Spark UI port issue when deploying Spark driver on YARN in yarn-cluster mode on EMR

2014-12-23 Thread Roberto Coluccio
Hello folks, I'm trying to deploy a Spark driver on Amazon EMR in yarn-cluster mode expecting to be able to access the Spark UI from the :4040 address (default port). The problem here is that the Spark UI port is always defined randomly at runtime, although I also tried to specify it in the spark-

Access resources from jar-local resources folder

2014-09-23 Thread Roberto Coluccio
Hello folks, I have a Spark Streaming application built with Maven (as jar) and deployed with the spark-submit script. The application project has the following (main) structure: myApp src main scala com.mycompany.package MyApp.scala DoSomething.scala ... resources aPerlScript.pl ... test