Re: Flink and Presto integration

Flavio Pompermaier Tue, 28 Jan 2020 03:04:25 -0800

Hive metastore is the de facto standard for Hadoop but in my use case I
have to query other databases (like MySQL, Oracle and SQL Server).
So Presto would be a good choice (apart from the fact that you need to
restart it when you add a new catalog..), and I'd like to have an easy
translation of the catalogs..
Another fear I have is that I could have different versions of the same
database type (e.g. Oracle or SQL server) and I'll probably hit an
incompatibility when using the latest jar of a connector.
>From what I see this corner case doesn't have a clear solution but I have
some workaround in mind that I need to verify (e.g. shade jars or allocate
source reader tasks to different Task Managers based on the deployed jar
versions..)


On Tue, Jan 28, 2020 at 11:05 AM Piotr Nowojski <pi...@ververica.com> wrote:

> Hi,
>
> Yes, Presto (in presto-hive connector) is just using hive Metastore to get
> the table definitions/meta data. If you connect to the same hive Metastore
> with Flink, both systems should be able to see the same tables.
>
> Piotrek
>
> On 28 Jan 2020, at 04:34, Jingsong Li <jingsongl...@gmail.com> wrote:
>
> Hi Flavio,
>
> Your requirement should be to use blink batch to read the tables in Presto?
> I'm not familiar with Presto's catalog. Is it like hive Metastore?
>
> If so, what needs to be done is similar to the hive connector.
> You need to implement a catalog of presto, which translates the Presto
> table into a Flink table. You may need to deal with partitions, statistics,
> and so on.
>
> Best,
> Jingsong Lee
>
> On Mon, Jan 27, 2020 at 9:58 PM Itamar Syn-Hershko <
> ita...@bigdataboutique.com> wrote:
>
>> Yes, Flink does batch processing by "reevaluating a stream" so to speak.
>> Presto doesn't have sources and sinks, only catalogs (which are always
>> allowing reads, and sometimes also writes).
>>
>> Presto catalogs are a configuration - they are managed on the node
>> filesystem as a configuration file and nowhere else. Flink sources/sinks
>> are programmatically configurable and are compiled into your Flink program.
>> So that is not possible at the moment, and all that's possible to do is get
>> that info form the API of both products and visualize that. Definitely not
>> managing them from a single place.
>>
>> On Mon, Jan 27, 2020 at 3:54 PM Flavio Pompermaier <pomperma...@okkam.it>
>> wrote:
>>
>>> Both Presto and Flink make use of a Catalog in order to be able to
>>> read/write data from a source/sink.
>>> I don't agree about " Flink is about processing data streams" because
>>> Flink is competitive also for the batch workloads (and this will be further
>>> improved in the next releases).
>>> I'd like to register my data sources/sinks in one single catalog (E.g.
>>> Presto) and then being able to reuse it also in Flink (with a simple
>>> translation).
>>> My idea of integration here is thus more at catalog level since I would
>>> use Presto for exploring data from UI and Flink to process it because once
>>> the configuration part has finished (since I have many Flink jobs that I
>>> don't want to throw away or rewrite).
>>>
>>> On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko <
>>> ita...@bigdataboutique.com> wrote:
>>>
>>>> Hi Flavio,
>>>>
>>>> Presto contributor and Starburst Partners here.
>>>>
>>>> Presto and Flink are solving completely different challenges. Flink is
>>>> about processing data streams as they come in; Presto is about ad-hoc /
>>>> periodic querying of data sources.
>>>>
>>>> A typical architecture would use Flink to process data streams and
>>>> write data and aggregations to some data stores (Redis, MemSQL, SQLs,
>>>> Elasticsearch, etc) and then using Presto to query those data stores (and
>>>> possible also others using Query Federation).
>>>>
>>>> What kind of integration will you be looking for?
>>>>
>>>> On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <
>>>> pomperma...@okkam.it> wrote:
>>>>
>>>>> Hi all,
>>>>> is there any integration between Presto and Flink? I'd like to use
>>>>> Presto for the UI part (preview and so on) while using Flink for the batch
>>>>> processing. Do you suggest something else otherwise?
>>>>>
>>>>> Best,
>>>>> Flavio
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> [image: logo] <https://bigdataboutique.com/>
>>>> Itamar Syn-Hershko
>>>> CTO, Founder
>>>> +972-54-2467860
>>>> ita...@bigdataboutique.com
>>>> https://bigdataboutique.com
>>>> <https://www.linkedin.com/in/itamar-syn-hershko-78b25013>
>>>> <https://twitter.com/synhershko>
>>>> <https://www.youtube.com/channel/UCBHr7lM2u6SCWPJvcKug-Yg>
>>>>
>>>
>>>
>>
>> --
>>
>> [image: logo] <https://bigdataboutique.com/>
>> Itamar Syn-Hershko
>> CTO, Founder
>> +972-54-2467860
>> ita...@bigdataboutique.com
>> https://bigdataboutique.com
>> <https://www.linkedin.com/in/itamar-syn-hershko-78b25013>
>> <https://twitter.com/synhershko>
>> <https://www.youtube.com/channel/UCBHr7lM2u6SCWPJvcKug-Yg>
>>
>
>
> --
> Best, Jingsong Lee
>
>
>

Re: Flink and Presto integration

Reply via email to