Hi Brant, Let me partially answer to your concerns: please follow a new open source project PL/HQL (www.plhql.org) aimed at allowing you to reuse existing logic and leverage existing skills at some extent, so you do not need to rewrite everything to Scala/Java and can do this gradually. I hope it can help.
Thanks, Dmitry On Sat, May 23, 2015 at 1:22 AM, Brant Seibert <brantseib...@hotmail.com> wrote: > Hi, The healthcare industry can do wonderful things with Apache Spark. > But, > there is already a very large base of data and applications firmly rooted > in > the relational paradigm and they are resistent to change - stuck on Oracle. > > ** > QUESTION 1 - Migrate legacy relational data (plus new transactions) to > distributed storage? > > DISCUSSION 1 - The primary advantage I see is not having to engage in the > lengthy (1+ years) process of creating a relational data warehouse and > cubes. Just store the data in a distributed system and "analyze first" in > memory with Spark. > > ** > QUESTION 2 - Will we have to re-write the enormous amount of logic that is > already built for the old relational system? > > DISCUSSION 2 - If we move the data to distributed, can we simply run that > existing relational logic as SparkSQL queries? [existing SQL --> Spark > Context --> Cassandra --> process in SparkSQL --> display in existing UI]. > Can we create an RDD that uses existing SQL? Or do we need to rewrite all > our SQL? > > ** > DATA SIZE - We are adding many new data sources to a system that already > manages health care data for over a million people. The number of rows may > not be enormous right now compared to the advertising industry, for > example, > but the number of dimensions runs well into the thousands. If we add to > this, IoT data for each health care patient, that creates billions of > events > per day, and the number of rows then grows exponentially. We would like to > be prepared to handle that huge data scenario. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Migrate-Relational-to-Distributed-tp22999.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >