Sqoop’s incremental data fetch will reduce the data size you need to pull from
source, but then by the time that incremental data fetch is complete, is it not
current again, if velocity of the data is high?
May be you can put a trigger in Postgres to send data to the big data cluster
as soon
I can't migrate this PostgreSQL data since lots of system using it,but I can
take this data to some NOSQL like base and query the Hbase, but here issue is
How can I make sure that Hbase has upto date data?
Is velocity an issue in Postgres that your data would become stale as soon as
it reache
Why cant you bulk pre-fetch the data to HDFS (like using Sqoop) instead of
hitting Postgres multiple times?
Sent from Windows Mail
From: ayan guha
Sent: Monday, July 27, 2015 4:41 PM
To: Jeetendra Gangele
Cc: felixcheun...@hotmail.com, user@spark.apache.org
You can call
Ravi
Spark (or in that case Big Data solutions like Hive) is suited for large
analytical loads, where the “scaling up” starts to pale in comparison to
“Scaling out” with regards to performance, versatility(types of data) and cost.
Without going into the details of MsSQL architecture, there is