The honest answer is that it is unclear to me at this point. I guess what I am really wondering is if there are cases where one would find it beneficial to use Spark against one or more RDBs?
On Thu, Feb 26, 2015 at 8:06 PM, Tobias Pfeiffer <t...@preferred.jp> wrote: > Gary, > > On Fri, Feb 27, 2015 at 8:40 AM, Gary Malouf <malouf.g...@gmail.com> > wrote: > >> I'm considering whether or not it is worth introducing Spark at my new >> company. The data is no-where near Hadoop size at this point (it sits in >> an RDS Postgres cluster). >> > > Will it ever become "Hadoop size"? Looking at the overhead of running even > a simple Hadoop setup (securely and with good performance, given about 1e6 > configuration parameters), I think it makes sense to stay in non-Hadoop > mode as long as possible. People may disagree ;-) > > Tobias > > PS. You may also want to have a look at > http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html > >