Here is the solution this looks perfect for me.
thanks for all your help
http://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/
On 28 July 2015 at 23:27, Jörn Franke wrote:
> Can you put some transparent cache in front of the database? Or some jdbc
> proxy?
>
Can you put some transparent cache in front of the database? Or some jdbc
proxy?
Le mar. 28 juil. 2015 à 19:34, Jeetendra Gangele a
écrit :
> can the source write to Kafka/Flume/Hbase in addition to Postgres? no
> it can't write ,this is due to the fact that there are many applications
> those a
can the source write to Kafka/Flume/Hbase in addition to Postgres? no
it can't write ,this is due to the fact that there are many applications
those are producing this postGreSql data.I can't really asked all the teams
to start writing to some other source.
velocity of the application is too high
Sqoop’s incremental data fetch will reduce the data size you need to pull from
source, but then by the time that incremental data fetch is complete, is it not
current again, if velocity of the data is high?
May be you can put a trigger in Postgres to send data to the big data cluster
as soon
I trying do that, but there will always data mismatch, since by the time
scoop is fetching main database will get so many updates. There is
something called incremental data fetch using scoop but that hits a
database rather than reading the WAL edit.
On 28 July 2015 at 02:52, wrote:
> Why can
Hi Ayan Thanks for reply.
Its around 5 GB having 10 tables...this data changes very frequently every
minutes few updates
its difficult to have this data in spark, if any updates happen on main
tables, how can I refresh spark data?
On 28 July 2015 at 02:11, ayan guha wrote:
> You can call dB
I can't migrate this PostgreSQL data since lots of system using it,but I can
take this data to some NOSQL like base and query the Hbase, but here issue is
How can I make sure that Hbase has upto date data?
Is velocity an issue in Postgres that your data would become stale as soon as
it reache
Why cant you bulk pre-fetch the data to HDFS (like using Sqoop) instead of
hitting Postgres multiple times?
Sent from Windows Mail
From: ayan guha
Sent: Monday, July 27, 2015 4:41 PM
To: Jeetendra Gangele
Cc: felixcheun...@hotmail.com, user@spark.apache.org
You can call
You can call dB connect once per partition. Please have a look at design
patterns of for each construct in document.
How big is your data in dB? How soon that data changes? You would be better
off if data is in spark already
On 28 Jul 2015 04:48, "Jeetendra Gangele" wrote:
> Thanks for your reply
Thanks for your reply.
Parallel i will be hitting around 6000 call to postgreSQl which is not good
my database will die.
these calls to database will keeps on increasing.
Handling millions on request is not an issue with Hbase/NOSQL
any other alternative?
On 27 July 2015 at 23:18, wrote:
>
You can have Spark reading from PostgreSQL through the data access API. Do you
have any concern with that approach since you mention copying that data into
HBase.
From: Jeetendra Gangele
Sent: Monday, July 27, 6:00 AM
Subject: Data from PostgreSQL to Spark
To: user
Hi All
I have a us
11 matches
Mail list logo