Hive metastore is the de facto standard for Hadoop but in my use case I have to query other databases (like MySQL, Oracle and SQL Server). So Presto would be a good choice (apart from the fact that you need to restart it when you add a new catalog..), and I'd like to have an easy translation of the catalogs.. Another fear I have is that I could have different versions of the same database type (e.g. Oracle or SQL server) and I'll probably hit an incompatibility when using the latest jar of a connector. >From what I see this corner case doesn't have a clear solution but I have some workaround in mind that I need to verify (e.g. shade jars or allocate source reader tasks to different Task Managers based on the deployed jar versions..)
On Tue, Jan 28, 2020 at 11:05 AM Piotr Nowojski <pi...@ververica.com> wrote: > Hi, > > Yes, Presto (in presto-hive connector) is just using hive Metastore to get > the table definitions/meta data. If you connect to the same hive Metastore > with Flink, both systems should be able to see the same tables. > > Piotrek > > On 28 Jan 2020, at 04:34, Jingsong Li <jingsongl...@gmail.com> wrote: > > Hi Flavio, > > Your requirement should be to use blink batch to read the tables in Presto? > I'm not familiar with Presto's catalog. Is it like hive Metastore? > > If so, what needs to be done is similar to the hive connector. > You need to implement a catalog of presto, which translates the Presto > table into a Flink table. You may need to deal with partitions, statistics, > and so on. > > Best, > Jingsong Lee > > On Mon, Jan 27, 2020 at 9:58 PM Itamar Syn-Hershko < > ita...@bigdataboutique.com> wrote: > >> Yes, Flink does batch processing by "reevaluating a stream" so to speak. >> Presto doesn't have sources and sinks, only catalogs (which are always >> allowing reads, and sometimes also writes). >> >> Presto catalogs are a configuration - they are managed on the node >> filesystem as a configuration file and nowhere else. Flink sources/sinks >> are programmatically configurable and are compiled into your Flink program. >> So that is not possible at the moment, and all that's possible to do is get >> that info form the API of both products and visualize that. Definitely not >> managing them from a single place. >> >> On Mon, Jan 27, 2020 at 3:54 PM Flavio Pompermaier <pomperma...@okkam.it> >> wrote: >> >>> Both Presto and Flink make use of a Catalog in order to be able to >>> read/write data from a source/sink. >>> I don't agree about " Flink is about processing data streams" because >>> Flink is competitive also for the batch workloads (and this will be further >>> improved in the next releases). >>> I'd like to register my data sources/sinks in one single catalog (E.g. >>> Presto) and then being able to reuse it also in Flink (with a simple >>> translation). >>> My idea of integration here is thus more at catalog level since I would >>> use Presto for exploring data from UI and Flink to process it because once >>> the configuration part has finished (since I have many Flink jobs that I >>> don't want to throw away or rewrite). >>> >>> On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko < >>> ita...@bigdataboutique.com> wrote: >>> >>>> Hi Flavio, >>>> >>>> Presto contributor and Starburst Partners here. >>>> >>>> Presto and Flink are solving completely different challenges. Flink is >>>> about processing data streams as they come in; Presto is about ad-hoc / >>>> periodic querying of data sources. >>>> >>>> A typical architecture would use Flink to process data streams and >>>> write data and aggregations to some data stores (Redis, MemSQL, SQLs, >>>> Elasticsearch, etc) and then using Presto to query those data stores (and >>>> possible also others using Query Federation). >>>> >>>> What kind of integration will you be looking for? >>>> >>>> On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier < >>>> pomperma...@okkam.it> wrote: >>>> >>>>> Hi all, >>>>> is there any integration between Presto and Flink? I'd like to use >>>>> Presto for the UI part (preview and so on) while using Flink for the batch >>>>> processing. Do you suggest something else otherwise? >>>>> >>>>> Best, >>>>> Flavio >>>>> >>>> >>>> >>>> -- >>>> >>>> [image: logo] <https://bigdataboutique.com/> >>>> Itamar Syn-Hershko >>>> CTO, Founder >>>> +972-54-2467860 >>>> ita...@bigdataboutique.com >>>> https://bigdataboutique.com >>>> <https://www.linkedin.com/in/itamar-syn-hershko-78b25013> >>>> <https://twitter.com/synhershko> >>>> <https://www.youtube.com/channel/UCBHr7lM2u6SCWPJvcKug-Yg> >>>> >>> >>> >> >> -- >> >> [image: logo] <https://bigdataboutique.com/> >> Itamar Syn-Hershko >> CTO, Founder >> +972-54-2467860 >> ita...@bigdataboutique.com >> https://bigdataboutique.com >> <https://www.linkedin.com/in/itamar-syn-hershko-78b25013> >> <https://twitter.com/synhershko> >> <https://www.youtube.com/channel/UCBHr7lM2u6SCWPJvcKug-Yg> >> > > > -- > Best, Jingsong Lee > > >