I have seen a similar scenario where we load data from a RDBMS into a NoSQL 
database… Spark made sense for velocity and parallel processing (and cost of 
licenses :) ).
 
> On Oct 15, 2017, at 21:29, Saravanan Thirumalai 
> <saravanan.thiruma...@gmail.com> wrote:
> 
> We are an Investment firm and have a MDM platform in oracle at a vendor 
> location and use Oracle Golden Gate to replicat data to our data center for 
> reporting needs. 
> Our data is not big data (total size 6 TB including 2 TB of archive data). 
> Moreover our data doesn't get updated often, nightly once (around 50 MB) and 
> some correction transactions during the day (<10 MB). We don't have external 
> users and hence data doesn't grow real-time like e-commerce.
> 
> When we replicate data from source to target, we transfer data through files. 
> So, if there are DML operations (corrections) during day time on a source 
> table, the corresponding file would have probably 100 lines of table data 
> that needs to be loaded into the target database. Due to low volume of data 
> we designed this through Informatica and this works in less than 2-5 minutes. 
> Can Spark be used in this case or would it be an overkill of technology use?
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to