Hi Sachin,
The idea of using Spark on RDBMS to do complex queries is interesting and
will mature as SQL on Spark gets closer to ANSI.
There are a number of challenges here:
1. The application owners prefer to stay on RDBMS
2. The application backend is based on a primary DB and multiple
Another use case for Spark is to use its in-memory and parallel processing
on RDBMS data.
This may sound a bit strange, but you can access your RDBMS table from
Spark via JDBC with parallel processing and engage the speed of Spark to
accelerate the queries.
To do this you may need to parallelise
Not sure if you are aware of these
1) Edx/Berkely/Databricks has three Spark related certifications. Might be a
good start.
2) Fair understanding of scala/distributed collection patterns to better
appreciate the internals of Spark. Coursera has three scala courses. I know
there are other
Keeping in mind Spark is a parallel computing engine, Spark does not change
your data infrastructure/data architecture. These days it's relatively
convenient to read data from a variety of sources (S3, HDFS, Cassandra,
...) and ditto on the output side.
For example, for one of my use-cases, I sto
I was hoping for someone to answer this question, As it resonates with many
developers who are new to Spark and trying to adopt it at their work.
Regards
Pradeep
On Dec 3, 2016, at 9:00 AM, Vasu Gourabathina
mailto:vgour...@gmail.com>> wrote:
Hi,
I know this is a broad question. If this is no
Hi,
I know this is a broad question. If this is not the right forum, appreciate
if you can point to other sites/areas that may be helpful.
Before posing this question, I did use our friend Google, but sanitizing
the query results from my need angle hasn't been easy.
Who I am:
- Have done data