I have seen a similar scenario where we load data from a RDBMS into a NoSQL database… Spark made sense for velocity and parallel processing (and cost of licenses :) ). > On Oct 15, 2017, at 21:29, Saravanan Thirumalai > <saravanan.thiruma...@gmail.com> wrote: > > We are an Investment firm and have a MDM platform in oracle at a vendor > location and use Oracle Golden Gate to replicat data to our data center for > reporting needs. > Our data is not big data (total size 6 TB including 2 TB of archive data). > Moreover our data doesn't get updated often, nightly once (around 50 MB) and > some correction transactions during the day (<10 MB). We don't have external > users and hence data doesn't grow real-time like e-commerce. > > When we replicate data from source to target, we transfer data through files. > So, if there are DML operations (corrections) during day time on a source > table, the corresponding file would have probably 100 lines of table data > that needs to be loaded into the target database. Due to low volume of data > we designed this through Informatica and this works in less than 2-5 minutes. > Can Spark be used in this case or would it be an overkill of technology use? > > >
--------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org