In these cases you may want to have a separate oracle instance for the batch process and another one for serving it to avoid sla surprises. Nevertheless, if data processing becomes more strategic cross-projects you may think about job management and HDFS using Hadoop with Spark.
Le mar. 22 sept. 2015 à 8:02, Sri Eswari Devi Subbiah < [email protected]> a écrit : > Hi, > > Thanks for the reply. Let me explain our scenario little bit more. > Currently we have multiple data feeds through files from different systems. > We run batch jobs to extract the data from files, normalize that data, > match that data against Oracle database and finally consolidate the cleaned > data in Oracle. > > I am evaluating rather than running batch jobs, can I run spark streaming > from the data files to finally write the cleansed data into Oracle > database. Once the data is consolidated in Oracle, it serves as the source > of truth for external users. > > Regards, > Sri Eswari. > > On Mon, Sep 21, 2015 at 10:55 PM, Jörn Franke <[email protected]> > wrote: > >> You do not need Hadoop. However, you should think about using it. If you >> use Spark to load data directly from Oracle then your database might have >> unexpected loads of data once a Spark node may fail. Additionally, the >> Oracle Database, if it is not based on local disk, may have a storage >> bottleneck. Furthermore, Spark standalone has no resource management >> mechanism for supporting different slas, you may need yarn (hadoop) for >> that. Finally, using the Oracle Database for storing all the data may be an >> expensive exercise. What I have seen often is that hadoop is used for >> storing all the data and managing the resources. Spark can be used for >> machine learning over this data and the Oracle Database (or any relational >> datastore, Nosql database, in-memory db) is used to serve the data to a lot >> of users. This is also the basic idea behind the lambda architecture. >> >> Le mar. 22 sept. 2015 à 7:13, Sri <[email protected]> a écrit : >> >>> Hi, >>> >>> We have a usecase where we get the dated from different systems and >>> finally >>> data will be consolidated into Oracle Database. Does spark is a valid >>> useless for this scenario. Currently we also don't have any big data >>> component. In case if we go with Spark to ingest data, does it require >>> hadoop. >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Ingestion-into-Relational-DB-tp24761.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >
