Spark standalone is not multi tenant you need one clusters per job. Maybe
you can try fair scheduling and use one cluster but i doubt it will be prod
ready...
Le 27 avr. 2017 5:28 AM, "anna stax" a écrit :
> Thanks Cody,
>
> As I already mentioned I am running spark streaming on EC2 cluster in
>
Thanks Cody,
As I already mentioned I am running spark streaming on EC2 cluster in
standalone mode. Now in addition to streaming, I want to be able to run
spark batch job hourly and adhoc queries using Zeppelin.
Can you please confirm that a standalone cluster is OK for this. Please
provide me so
The standalone cluster manager is fine for production. Don't use Yarn
or Mesos unless you already have another need for it.
On Wed, Apr 26, 2017 at 4:53 PM, anna stax wrote:
> Hi Sam,
>
> Thank you for the reply.
>
> What do you mean by
> I doubt people run spark in a. Single EC2 instance, certa
I am using Spark 2.1 BTW.
On Wed, Apr 26, 2017 at 3:22 PM, kant kodali wrote:
> Hi All,
>
> I am wondering how to create SparkSession using SparkConf object? Although
> I can see that most of the key value pairs we set in SparkConf we can also
> set in SparkSession or SparkSession.Builder howev
Fellow Spark users,
The Spark Summit Program Committee requested that I share with this Spark user
group few sessions and events they have added this year:
Hackathon
1-day and 2-day training courses
3 new tracks: Technical Deep Dive, Streaming and Machine Learning
and more…
If you planing to at
Hi All,
I am wondering how to create SparkSession using SparkConf object? Although
I can see that most of the key value pairs we set in SparkConf we can also
set in SparkSession or SparkSession.Builder however I don't see
sparkConf.setJars which is required right? Because we want the driver jar
t
Hi Sam,
Thank you for the reply.
What do you mean by
I doubt people run spark in a. Single EC2 instance, certainly not in
production I don't think
What is wrong in having a data pipeline on EC2 that reads data from kafka,
processes using spark and outputs to cassandra? Please explain.
Thanks
-A
Hi Anna
There are a variety of options for launching spark clusters. I doubt people
run spark in a. Single EC2 instance, certainly not in production I don't
think
I don't have enough information of what you are trying to do but if you are
just trying to set things up from scratch then I think you
I need to setup a spark cluster for Spark streaming and scheduled batch
jobs and adhoc queries.
Please give me some suggestions. Can this be done in standalone mode.
Right now we have a spark cluster in standalone mode on AWS EC2 running
spark streaming application. Can we run spark batch jobs and
Hi,
Good progress!
Can you remove metastore_db directory and start ./bin/pyspark over? I
don't think starting from ~ is necessary.
Pozdrawiam,
Jacek Laskowski
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/
have you read
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#kafka-itself
On Wed, Apr 26, 2017 at 1:17 PM, Dominik Safaric
wrote:
> The reason why I want to obtain this information, i.e. timestamp> tuples is to relate the consumption with the production rates
> using
Sorry about that, hangouts on air broke in the first one :(
On Wed, Apr 26, 2017 at 8:41 AM, Marco Mistroni wrote:
> Uh i stayed online in the other link but nobody joinedWill follow
> transcript
> Kr
>
> On 26 Apr 2017 9:35 am, "Holden Karau" wrote:
>
>> And the recording of our discussion
The reason why I want to obtain this information, i.e. tuples is to relate the consumption with the production rates using
the __consumer_offsets Kafka internal topic. Interestedly, the Spark’s
KafkaConsumer implementation does not auto commit the offsets upon offset
commit expiration, because
Kicking off the process from ~ directory makes the message go away. I guess the
metastore_db created is relative to path of where it’s executed.
FIX: kick off from ~ directory
./spark-2.1.0-bin-hadoop2.7/bin/pysark
From: "Afshin, Bardia"
Date: Wednesday, April 26, 2017 at 9:47 AM
To: Jacek Lasko
What is it you're actually trying to accomplish?
You can get topic, partition, and offset bounds from an offset range like
http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#obtaining-offsets
Timestamp isn't really a meaningful idea for a range of offsets.
On Tue, Apr 25
Hi,
One common situation I run across is that I want to compact my data and
select the mode (most frequent value) in several columns for each group.
Even calculating mode for one column in SQL is a bit tricky. The ways I've
seen usually involve a nested sub-select with a group by + count and then
Thanks for the hint, I don’t think. I thought it’s a permission issue that it
cannot read or write to ~/metastore_db but the directory is definitely there
drwxrwx--- 5 ubuntu ubuntu 4.0K Apr 25 23:27 metastore_db
Just re ran the command from within root spark folder ./bin/pyspark and the
same
ApacheCon is just three weeks away, in Miami, Florida, May 15th - 18th.
http://apachecon.com/
There's still time to register and attend. ApacheCon is the best place
to find out about tomorrow's software, today.
ApacheCon is the official convention of The Apache Software Foundation,
and includes t
Uh i stayed online in the other link but nobody joinedWill follow
transcript
Kr
On 26 Apr 2017 9:35 am, "Holden Karau" wrote:
> And the recording of our discussion is at https://www.youtube.com/
> watch?v=2q0uAldCQ8M
> A few of us have follow up things and we will try and do another meeting
Hi Devender,
I have always gone with the 2nd approach, only so I don't have to chain a bunch
of "option()." calls together. You should be able to use either.
Thanks,
Subhash
Sent from my iPhone
> On Apr 26, 2017, at 3:26 AM, Devender Yadav
> wrote:
>
> Hi All,
>
>
> I am using Spak 1.6.2
Michael Gummelt, Thanks!!! I'm forgot about debug logging!
On Mon, Apr 24, 2017 at 9:30 PM Michael Gummelt
wrote:
> Have you run with debug logging? There are some hints in the debug logs:
> https://github.com/apache/spark/blob/branch-2.1/mesos/src/main/scala/org/apache/spark/scheduler/cluster/
And the recording of our discussion is at
https://www.youtube.com/watch?v=2q0uAldCQ8M
A few of us have follow up things and we will try and do another meeting in
about a month or two :)
On Tue, Apr 25, 2017 at 1:04 PM, Holden Karau wrote:
> Urgh hangouts did something frustrating, updated link
>
explain it and you'll know what happens under the covers.
i.e. Use explain on the Dataset.
Jacek
On 25 Apr 2017 12:46 a.m., "Lavelle, Shawn" wrote:
> Hello Spark Users!
>
>Does the Spark Optimization engine reduce overlapping column ranges?
> If so, should it push this down to a Data Sourc
Hi,
You've got two spark sessions up and running (and given Spark SQL uses
Derby-managed Hive MetaStock hence the issue)
Please don't start spark-submit from inside bin. Rather bin/spark-submit...
Jacek
On 26 Apr 2017 1:57 a.m., "Afshin, Bardia"
wrote:
I’m having issues when I fire up pyspar
Hi All,
I am using Spak 1.6.2
Which is suitable way to create dataframe from RDBMS table.
DataFrame df =
sqlContext.read().format("jdbc").options(options).load();
or
DataFrame df = sqlContext.read().jdbc(url, table, properties);
Regards,
Devender
___
25 matches
Mail list logo