Re: Design patterns for Spark implementation

2016-12-10 Thread Mich Talebzadeh
Hi Sachin, The idea of using Spark on RDBMS to do complex queries is interesting and will mature as SQL on Spark gets closer to ANSI. There are a number of challenges here: 1. The application owners prefer to stay on RDBMS 2. The application backend is based on a primary DB and multiple

Re: Design patterns for Spark implementation

2016-12-08 Thread Mich Talebzadeh
Another use case for Spark is to use its in-memory and parallel processing on RDBMS data. This may sound a bit strange, but you can access your RDBMS table from Spark via JDBC with parallel processing and engage the speed of Spark to accelerate the queries. To do this you may need to parallelise

Re: Design patterns for Spark implementation

2016-12-08 Thread Sachin Naik
Not sure if you are aware of these 1) Edx/Berkely/Databricks has three Spark related certifications. Might be a good start. 2) Fair understanding of scala/distributed collection patterns to better appreciate the internals of Spark. Coursera has three scala courses. I know there are other

Re: Design patterns for Spark implementation

2016-12-08 Thread Peter Figliozzi
Keeping in mind Spark is a parallel computing engine, Spark does not change your data infrastructure/data architecture. These days it's relatively convenient to read data from a variety of sources (S3, HDFS, Cassandra, ...) and ditto on the output side. For example, for one of my use-cases, I sto

Re: Design patterns for Spark implementation

2016-12-04 Thread Pradeep Gaddam
I was hoping for someone to answer this question, As it resonates with many developers who are new to Spark and trying to adopt it at their work. Regards Pradeep On Dec 3, 2016, at 9:00 AM, Vasu Gourabathina mailto:vgour...@gmail.com>> wrote: Hi, I know this is a broad question. If this is no

Design patterns for Spark implementation

2016-12-03 Thread Vasu Gourabathina
Hi, I know this is a broad question. If this is not the right forum, appreciate if you can point to other sites/areas that may be helpful. Before posing this question, I did use our friend Google, but sanitizing the query results from my need angle hasn't been easy. Who I am: - Have done data