We are using Spark on Kubernetes on AWS (it's a long story) but it does
work. It's still on the raw side but we've been pretty successful.
We configured our cluster primarily with Kube-AWS and auto scaling groups.
There are gotcha's there, but so far we've been quite successful.
Gary Lucas
On 1
Thanks everyone for their suggestions. Does any of you take care of auto
scale up and down of your underlying spark clusters on AWS?
On Nov 14, 2017 10:46 AM, "lucas.g...@gmail.com"
wrote:
Hi Ashish, bear in mind that EMR has some additional tooling available that
smoothes out some S3 problems t
Hi Ashish, bear in mind that EMR has some additional tooling available that
smoothes out some S3 problems that you may / almost certainly will
encounter.
We are using Spark / S3 not on EMR and have encountered issues with file
consistency, you can deal with it but be aware it's additional technica
Another option that we are trying internally is to uses Mesos for isolating
different jobs or groups. Within a single group, using Livy to create
different spark contexts also works.
- Affan
On Tue, Nov 14, 2017 at 8:43 AM, ashish rawat wrote:
> Thanks Sky Yin. This really helps.
>
> On Nov 14,
Thanks Sky Yin. This really helps.
On Nov 14, 2017 12:11 AM, "Sky Yin" wrote:
We are running Spark in AWS EMR as data warehouse. All data are in S3 and
metadata in Hive metastore.
We have internal tools to creat juypter notebook on the dev cluster. I
guess you can use zeppelin instead, or Livy?
We are running Spark in AWS EMR as data warehouse. All data are in S3 and
metadata in Hive metastore.
We have internal tools to creat juypter notebook on the dev cluster. I
guess you can use zeppelin instead, or Livy?
We run genie as a job server for the prod cluster, so users have to submit
thei
ing it for exploratory analysis.
>> Spark is great for this ☺
>>
>>
>>
>> -Pat
>>
>>
>>
>>
>>
>> *From: *Vadim Semenov
>> *Date: *Sunday, November 12, 2017 at 1:06 PM
>> *To: *Gourav Sengupta
>> *Cc: *Phillip He
mber 12, 2017 at 1:06 PM
> *To: *Gourav Sengupta
> *Cc: *Phillip Henry , ashish rawat <
> dceash...@gmail.com>, Jörn Franke , Deepak Sharma <
> deepakmc...@gmail.com>, spark users
> *Subject: *Re: Spark based Data Warehouse
>
>
>
> It's actually quite simple to
pak Sharma
, spark users
Subject: Re: Spark based Data Warehouse
It's actually quite simple to answer
> 1. Is Spark SQL and UDF, able to handle all the workloads?
Yes
> 2. What user interface did you provide for data scientist, data engineers and
> analysts
Home-grown platform, EMR,
It's actually quite simple to answer
> 1. Is Spark SQL and UDF, able to handle all the workloads?
Yes
> 2. What user interface did you provide for data scientist, data engineers
and analysts
Home-grown platform, EMR, Zeppelin
> What are the challenges in running concurrent queries, by many users
Dear Ashish,
what you are asking for involves at least a few weeks of dedicated
understanding of your used case and then it takes at least 3 to 4 months to
even propose a solution. You can even build a fantastic data warehouse just
using C++. The matter depends on lots of conditions. I just think t
Hi, Ashish.
You are correct in saying that not *all* functionality of Spark is
spill-to-disk but I am not sure how this pertains to a "concurrent user
scenario". Each executor will run in its own JVM and is therefore isolated
from others. That is, if the JVM of one user dies, this should not effec
Thanks Jorn and Phillip. My question was specifically to anyone who have
tried creating a system using spark SQL, as Data Warehouse. I was trying to
check, if someone has tried it and they can help with the kind of workloads
which worked and the ones, which have problems.
Regarding spill to disk,
Agree with Jorn. The answer is: it depends.
In the past, I've worked with data scientists who are happy to use the
Spark CLI. Again, the answer is "it depends" (in this case, on the skills
of your customers).
Regarding sharing resources, different teams were limited to their own
queue so they cou
What do you mean all possible workloads?
You cannot prepare any system to do all possible processing.
We do not know the requirements of your data scientists now or in the future so
it is difficult to say. How do they work currently without the new solution? Do
they all work on the same data? I
I am looking for similar solution more aligned to data scientist group.
The concern i have is about supporting complex aggregations at runtime .
Thanks
Deepak
On Nov 12, 2017 12:51, "ashish rawat" wrote:
> Hello Everyone,
>
> I was trying to understand if anyone here has tried a data warehouse
16 matches
Mail list logo