Re: Spark Ingestion into Relational DB

Jörn Franke Mon, 21 Sep 2015 23:40:55 -0700

In these cases you may want to have a separate oracle instance for the
batch process and another one for serving it to avoid  sla surprises.
Nevertheless, if data processing becomes more strategic cross-projects you
may think about job management and HDFS using Hadoop with Spark.


Le mar. 22 sept. 2015 à 8:02, Sri Eswari Devi Subbiah <
[email protected]> a écrit :

> Hi,
>
> Thanks for the reply. Let me explain our scenario little bit more.
> Currently we have multiple data feeds through files from different systems.
> We run batch jobs to extract the data from files, normalize that data,
> match that data against Oracle database and finally consolidate the cleaned
> data in Oracle.
>
> I am evaluating rather than running batch jobs, can I run spark streaming
> from the data files to finally write the cleansed data into Oracle
> database. Once the data is consolidated in Oracle, it serves as the source
> of truth for external users.
>
> Regards,
> Sri Eswari.
>
> On Mon, Sep 21, 2015 at 10:55 PM, Jörn Franke <[email protected]>
> wrote:
>
>> You do not need Hadoop. However, you should think about using it. If you
>> use Spark to load data directly from Oracle then your database might have
>> unexpected loads of data once a Spark node may fail. Additionally, the
>> Oracle Database, if it is not based on local disk, may have a storage
>> bottleneck. Furthermore, Spark standalone has no resource management
>> mechanism for supporting different slas, you may need yarn (hadoop) for
>> that. Finally, using the Oracle Database for storing all the data may be an
>> expensive exercise. What I have seen often is that hadoop is used for
>> storing all the data and managing the resources. Spark can be used for
>> machine learning over this data and the Oracle Database (or any relational
>> datastore, Nosql database, in-memory db) is used to serve the data to a lot
>> of users. This is also the basic idea behind the lambda architecture.
>>
>> Le mar. 22 sept. 2015 à 7:13, Sri <[email protected]> a écrit :
>>
>>> Hi,
>>>
>>> We have a usecase  where we get the dated from different systems and
>>> finally
>>> data will be consolidated into Oracle Database. Does spark is a valid
>>> useless for this scenario. Currently we also don't have any big data
>>> component. In case if we go with Spark to ingest data, does it require
>>> hadoop.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Ingestion-into-Relational-DB-tp24761.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>

Re: Spark Ingestion into Relational DB

Reply via email to