It would be a dream to have an easy-to-use dynamic metric system AND a
reliable counting system (accumulator-like) in Spark...
Thanks
Roberto
On Tue, May 7, 2019 at 3:54 AM Saisai Shao wrote:
> I think the main reason why that was not merged is that Spark itself
> doesn't have such requirement,
Hello folks,
I'm trying to work around an issue with some dependencies by trying to
specify at spark-submit time that I want my (user) classpath to be resolved
and taken into account first (against the jars received through the System
Classpath, which is /data/cloudera/parcels/CDH/jars/).
In orde
Any chance anyone gave a look at this?
Thanks!
On Wed, Feb 10, 2016 at 10:46 AM, Roberto Coluccio <
roberto.coluc...@gmail.com> wrote:
> Thanks Shixiong!
>
> I'm attaching the thread dumps (I printed the Spark UI after expanding all
> the elements, hope that's fi
don't use
Kinesis and open any Receivers.
Thank you!
Roberto
On Tue, Feb 2, 2016 at 4:40 PM, Roberto Coluccio wrote:
> Hi,
>
> I'm struggling around an issue ever since I tried to upgrade my Spark
> Streaming solution from 1.4.1 to 1.5+.
>
> I have a Spark Streaming app
Hi,
I'm struggling around an issue ever since I tried to upgrade my Spark
Streaming solution from 1.4.1 to 1.5+.
I have a Spark Streaming app which creates 3 ReceiverInputDStreams
leveraging KinesisUtils.createStream API.
I used to leverage a timeout to terminate my app
(StreamingContext.awaitTe
Hi there,
I'm facing a weird issue when upgrading from Spark 1.4.1 streaming driver
on EMR 3.9 (hence Hadoop 2.4.0) to Spark 1.5.2 on EMR 4.2 (hence Hadoop
2.6.0).
Basically, the very same driver which used to terminate after a timeout as
expected, now does not. In particular, as long as the driv
Hello,
I have a batch and a streaming driver using same functions (Scala). I use
accumulators (passed to functions constructors) to count stuff.
In the batch driver, doing so in the right point of the pipeline, I'm able
to retrieve the accumulator value and print it as log4j log.
In the streamin
tor
application logs (in an automated fashion) for long-running processes like
streaming driver and if are there out-of-the-box solutions.
Thanks,
Roberto
On Thu, Dec 10, 2015 at 3:06 PM, Steve Loughran
wrote:
>
> > On 10 Dec 2015, at 14:52, Roberto Coluccio
> wrote:
> >
Hello,
I'm investigating on a solution to real-time monitor Spark logs produced by
my EMR cluster in order to collect statistics and trigger alarms. Being on
EMR, I found the CloudWatch Logs + Lambda pretty straightforward and, since
I'm on AWS, those service are pretty well integrated together..b
e exception there.
>
> Best,
> Burak
>
> On Mon, Aug 24, 2015 at 9:09 AM, Roberto Coluccio <
> roberto.coluc...@gmail.com> wrote:
>
>> Hello folks,
>>
>> I'm experiencing an unexpected behaviour, that suggests me thinking about
>> my missing
Hello folks,
I'm experiencing an unexpected behaviour, that suggests me thinking about
my missing notions on how Spark works. Let's say I have a Spark driver that
invokes a function like:
- in myDriver -
val sparkContext = new SparkContext(mySparkConf)
val inputPath = "file://home/myUser
Please community, I'd really appreciate your opinion on this topic.
Best regards,
Roberto
-- Forwarded message --
From: Roberto Coluccio
Date: Sat, Jul 25, 2015 at 6:28 PM
Subject: [Spark + Hive + EMR + S3] Issue when reading from Hive external
table backed on S3 with
Hello Spark community,
I currently have a Spark 1.3.1 batch driver, deployed in YARN-cluster mode
on an EMR cluster (AMI 3.7.0) that reads input data through an HiveContext,
in particular SELECTing data from an EXTERNAL TABLE backed on S3. Such
table has dynamic partitions and contains *hundreds o
Hello community,
I'm currently using Spark 1.3.1 with Hive support for outputting processed
data on an external Hive table backed on S3. I'm using a manual
specification of the delimiter, but I'd want to know if is there any
"clean" way to write in CSV format:
*val* sparkConf = *new* SparkConf()
I got a similar issue. Might your as well be related to this
https://issues.apache.org/jira/browse/SPARK-8368 ?
On Fri, Jun 26, 2015 at 2:00 PM, Akhil Das
wrote:
> Those provided spark libraries are compatible with scala 2.11?
>
> Thanks
> Best Regards
>
> On Fri, Jun 26, 2015 at 4:48 PM, Srikan
this. It wouldn't work with anything
> less than 256m for a simple piece of code.
> 1.3.1 used to work with default(64m I think)
>
> Srikanth
>
> On Wed, Jun 24, 2015 at 12:47 PM, Roberto Coluccio <
> roberto.coluc...@gmail.com> wrote:
>
>> Did you try to pass i
Did you try to pass it with
--driver-java-options -XX:MaxPermSize=256m
as spark-shell input argument?
Roberto
On Wed, Jun 24, 2015 at 5:57 PM, stati wrote:
> Hello,
>
> I moved from 1.3.1 to 1.4.0 and started receiving
> "java.lang.OutOfMemoryError: PermGen space" when I use spark-shell.
>
I confirm,
Christopher was very kind helping me out here. The solution presented in the
linked doc worked perfectly. IMO it should be linked in the official Spark
documentation.
Thanks again,
Roberto
> On 20 Jun 2015, at 19:25, Bozeman, Christopher wrote:
>
> We worked it out. There was m
Hi!
I'm struggling with an issue with Spark 1.3.1 running on YARN, running on
an AWS EMR cluster. Such cluster is based on AMI 3.7.0 (hence Amazon Linux
2015.03, Hive 0.13 already installed and configured on the cluster, Hadoop
2.4, etc...). I make use of the AWS emr-bootstrap-action "*install-spa
Hey Cheng, thank you so much for your suggestion, the problem was actually
a column/field called "timestamp" in one of the case classes!! Once I
changed its name everything worked out fine again. Let me say it was kinda
frustrating ...
Roberto
On Wed, Mar 18, 2015 at 4:07 PM, Robert
0, it depends on the
> actual contents of your query.
>
> Yin had opened a PR for this, although not merged yet, it should be a
> valid fix https://github.com/apache/spark/pull/5078
>
> This fix will be included in 1.3.1.
>
> Cheng
>
> On 3/18/15 10:04 PM, Roberto Coluc
an 22) String fields.
Hope the situation is a bit more clear. Thanks anyone who will help me out
here.
Roberto
On Wed, Mar 18, 2015 at 12:09 PM, Cheng Lian wrote:
> Would you mind to provide the query? If it's confidential, could you
> please help constructing a query that reprodu
Hi everybody,
When trying to upgrade from Spark 1.1.1 to Spark 1.2.x (tried both 1.2.0
and 1.2.1) I encounter a weird error never occurred before about which I'd
kindly ask for any possible help.
In particular, all my Spark SQL queries fail with the following exception:
java.lang.RuntimeExcepti
Hello folks,
I'm trying to deploy a Spark driver on Amazon EMR in yarn-cluster mode
expecting to be able to access the Spark UI from the :4040
address (default port). The problem here is that the Spark UI port is
always defined randomly at runtime, although I also tried to specify it in
the spark-
Hello folks,
I have a Spark Streaming application built with Maven (as jar) and deployed
with the spark-submit script. The application project has the following
(main) structure:
myApp
src
main
scala
com.mycompany.package
MyApp.scala
DoSomething.scala
...
resources
aPerlScript.pl
...
test
25 matches
Mail list logo