;
> test
>
> test
>
>
>
>
>
>
>
> On Tue, Aug 25, 2015 at 2:10 PM, Mike Trienis
> wrote:
>
>> Hello,
>>
>> I a
Hello,
I am using sbt and created a unit test where I create a `HiveContext` and
execute some query and then return. Each time I run the unit test the JVM
will increase it's memory usage until I get the error:
Internal error when running tests: java.lang.OutOfMemoryError: PermGen space
Exception
Hi All,
I would like some clarification regarding window functions for Apache Spark
1.4.0
-
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
In particular, the "rowsBetween"
* {{{
* val w = Window.partitionBy("name").orderBy("id")
* df.se
Hi All,
I have an RDD of case class objects.
scala> case class Entity(
| value: String,
| identifier: String
| )
defined class Entity
scala> Entity("hello", "id1")
res25: Entity = Entity(hello,id1)
During a map operation, I'd like to return a new RDD that contains all of
ow()
>
>
>
> Mohammed
>
>
>
> *From:* Harish Butani [mailto:rhbutani.sp...@gmail.com]
> *Sent:* Monday, July 20, 2015 5:37 PM
> *To:* Mohammed Guller
> *Cc:* Michael Armbrust; Mike Trienis; user@spark.apache.org
>
> *Subject:* Re: Data frames select and where clause
I'd like to understand why the where field must exist in the select clause.
For example, the following select statement works fine
- df.select("field1", "filter_field").filter(df("filter_field") ===
"value").show()
However, the next one fails with the error "in operator !Filter
(filter_fie
Hello,
I'd like to understand how other people have been aggregating metrics
using Spark Streaming and Cassandra database. Currently I have design
some data models that will stored the rolled up metrics. There are two
models that I am considering:
CREATE TABLE rollup_using_counters (
metric_1
> since they are usually foreground processes
> with master it's a bit more complicated, ./sbin/start-master.sh goes
> background which is not good for supervisor, but anyway I think it's
> doable(going to setup it too in a few days)
>
> On 3 June 2015 at 21:46, Mike Trieni
Hi All,
I am curious to know if anyone has successfully deployed a spark cluster
using supervisord?
- http://supervisord.org/
Currently I am using the cluster launch scripts which are working greater,
however, every time I reboot my VM or development environment I need to
re-launch the cluste
cutor is simply a jvm instance and
> as such it can be granted any number of cores and ram
>
> So check how many cores you have per executor
>
>
> Sent from Samsung Mobile
>
>
> Original message
> From: Mike Trienis
> Date:2015/05/22 21:51 (GMT+00:
I guess each receiver occupies a executor. So there was only one executor
available for processing the job.
On Fri, May 22, 2015 at 1:24 PM, Mike Trienis
wrote:
> Hi All,
>
> I have cluster of four nodes (three workers and one master, with one core
> each) which consumes data from K
Hi All,
I have cluster of four nodes (three workers and one master, with one core
each) which consumes data from Kinesis at 15 second intervals using two
streams (i.e. receivers). The job simply grabs the latest batch and pushes
it to MongoDB. I believe that the problem is that all tasks are execu
evant spark streaming logs that are
> generated when you do this?
>
> I saw a lot of "lease not owned by this Kinesis Client" type of errors,
> from what I remember.
>
> lemme know!
>
> -Chris
>
> On May 8, 2015, at 4:36 PM, Mike Trienis wrote:
>
>
&
you see errors, you may need to manually delete
the DynamoDB table.*
On Fri, May 8, 2015 at 2:06 PM, Mike Trienis
wrote:
> Hi All,
>
> I am submitting the assembled fat jar file by the command:
>
> bin/spark-submit --jars /spark-streaming-kinesis-asl_2.10-1.3.
Hi All,
I am submitting the assembled fat jar file by the command:
bin/spark-submit --jars /spark-streaming-kinesis-asl_2.10-1.3.0.jar --class
com.xxx.Consumer -0.1-SNAPSHOT.jar
It reads the data file from kinesis using the stream name defined in a
configuration file. It turns out that it re
;lib' directory
> has an uber jar spark-assembly-1.3.0-hadoop1.0.4.jar. At one point in Spark
> 1.2 I found a conflict between httpclient versions that my uber jar pulled
> in for AWS libraries and the one bundled in the spark uber jar. I hand
> patched the spark uber jar to rem
roduce an "uber jar."
>
> Fyi, I've been having trouble consuming data out of Kinesis with Spark
> with no success :(
> Would be curious to know if you got it working.
>
> Vadim
>
> On Apr 13, 2015, at 9:36 PM, Mike Trienis wrote:
>
> Hi All,
>
> I
> with no success :(
> Would be curious to know if you got it working.
>
> Vadim
>
> On Apr 13, 2015, at 9:36 PM, Mike Trienis wrote:
>
> Hi All,
>
> I have having trouble building a fat jar file through sbt-assembly.
>
> [warn] Merging 'META-INF/NOTICE.
Hi All,
I have having trouble building a fat jar file through sbt-assembly.
[warn] Merging 'META-INF/NOTICE.txt' with strategy 'rename'
[warn] Merging 'META-INF/NOTICE' with strategy 'rename'
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'rename'
[warn] Merging 'META-INF/LICENSE' with strat
It's because your tests are running in parallel and you can only have one
context running at a time.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-run-unit-test-tp14459p22429.html
Sent from the Apache Spark User List mailing list archive at Nabble.
ch
> interval).
>
> this goes for any spark streaming implementation - not just Kinesis.
>
> lemme know if that works for you.
>
> thanks!
>
> -Chris
> _
> From: Mike Trienis
> Sent: Wednesday, March 18, 2015 2:45 PM
> Subject: S
Hi All,
I am pushing data from Kinesis stream to S3 using Spark Streaming and
noticed that during testing (i.e. master=local[2]) the batches (1 second
intervals) were falling behind the incoming data stream at about 5-10
events / second. It seems that the rdd.saveAsTextFile(s3n://...) is taking
at
Please ignore my question, you can simply specify the root directory and it
looks like redshift takes care of the rest.
copy mobile
from 's3://BUCKET_NAME/'
credentials
json 's3://BUCKET_NAME/jsonpaths.json'
On Thu, Mar 5, 2015 at 3:33 PM, Mike Trienis
wrote:
> Hi
Hi All,
I am receiving data from AWS Kinesis using Spark Streaming and am writing
the data collected in the dstream to s3 using output function:
dstreamData.saveAsTextFiles("s3n://XXX:XXX@/")
After the run the application for several seconds, I end up with a sequence
of directories in S3 tha
Hi All,
I am looking at integrating a data stream from AWS Kinesis to AWS Redshift
and since I am already ingesting the data through Spark Streaming, it seems
convenient to also push that data to AWS Redshift at the same time.
I have taken a look at the AWS kinesis connector although I am not sur
Hi All,
I have Spark Streaming setup to write data to a replicated MongoDB database
and would like to understand if there would be any issues using the
Reactive Mongo library to write directly to the mongoDB? My stack is Apache
Spark sitting on top of Cassandra for the datastore, so my thinking is
Hi All,
I have Spark Streaming setup to write data to a replicated MongoDB database
and would like to understand if there would be any issues using the Reactive
Mongo library to write directly to the mongoDB? My stack is Apache Spark
sitting on top of Cassandra for the datastore, so my thinking is
t;> Happy hacking
>>
>> Chris
>>
>> Von: Franc Carter
>> Datum: Mittwoch, 11. Februar 2015 10:03
>> An: Paolo Platter
>> Cc: Mike Trienis , "user@spark.apache.org" <
>> user@spark.apache.org>
>> Betreff: Re: Datastore HDFS v
Hi,
I am considering implement Apache Spark on top of Cassandra database after
listing to related talk and reading through the slides from DataStax. It
seems to fit well with our time-series data and reporting requirements.
http://www.slideshare.net/patrickmcfadin/apache-cassandra-apache-spark-fo
29 matches
Mail list logo