Thank you all for valuable inputs On Wednesday, August 24, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> If this is one off then Spark will do OK. > > Sybase IQ provides bcp that creates a flat file tab/comma separated and > you can use that to extract IQ table and put it on HDFS and create an > external table. > > This is of course is a one off. > > You can also use SRS (SAP Replication Server) to get the data out first > time and sync Hive table with Sybase IQ table real time. You will need SRS > SP 204 or above to make this work. > > Talk to your DBA if they can get SRS SP from Sybase for this purpose. I > have done it many times. I think it is stable enough for this purpose. > > HTH > > > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 24 August 2016 at 22:35, Gopal Vijayaraghavan <gop...@apache.org > <javascript:_e(%7B%7D,'cvml','gop...@apache.org');>> wrote: > >> >> >> > val d = HiveContext.read.format("jdbc").options( >> ... >> >> The sqoop job takes 7 hours to load 15 days of data, even while setting >> >>the direct load option to 6. Hive is using MR framework. >> >> In generaly, the jdbc implementations tend to react rather badly to large >> extracts like this - the throttling usually happens on the operational >> database end rather than being a problem on the MR side. >> >> >> Sqoop is good enough for a one-shot import, but doing it frequently is >> best done by the database's own dump protocols, which are generally not >> throttled similarly. >> >> Pinterest recently put out a document on how they do this >> >> https://engineering.pinterest.com/blog/tracker-ingesting-mys >> ql-data-scale-p >> art-1 >> <https://engineering.pinterest.com/blog/tracker-ingesting-mysql-data-scale-part-1> >> >> + >> https://engineering.pinterest.com/blog/tracker-ingesting-mys >> ql-data-scale-p >> art-2 >> <https://engineering.pinterest.com/blog/tracker-ingesting-mysql-data-scale-part-2> >> >> More interesting continous ingestion reads directly off the replication >> protocol write-ahead logs. >> >> https://github.com/Flipkart/MySQL-replication-listener/tree/ >> master/examples >> /mysql2hdfs >> <https://github.com/Flipkart/MySQL-replication-listener/tree/master/examples/mysql2hdfs> >> >> + >> https://github.com/flipkart-incubator/storm-mysql >> >> >> But all of these tend to be optimized to a database engine, while the JDBC >> pipe tends to work slowly for all engines. >> >> Cheers, >> Gopal >> >> >> >