Re: Dynamic ad hoc query deployment strategy

2020-11-24 Thread lalala
Hi Timo and Dawid, Thank you for a detailed answer; it looks like we need to reconsider all job submission flow. What is the best way to compare the new job graph? Can we use Flink visualizer to ensure that the new job graph shares the table as you mention It is not guaranteed? Best regards,

Re: Dynamic ad hoc query deployment strategy

2020-11-24 Thread Timo Walther
I agree with Dawid. Maybe one thing to add is that reusing parts of the pipeline is possible via StatementSets in TableEnvironment. They allow you to add multiple queries that consume from a common part of the pipeline (for example a common source). But all of that is compiled into one big job

Re: Dynamic ad hoc query deployment strategy

2020-11-24 Thread Dawid Wysakowicz
Hi, Really sorry for a late reply. To the best of my knowledge there is no such possibility to "attach" to a source/reader of a different job. Every job would read the source separately. `The GenericInMemoryCatalog is an in-memory implementation of a catalog. All objects will be available only f

Re: Dynamic ad hoc query deployment strategy

2020-11-23 Thread lalala
Hi Till, Thank you for your comment. I am looking forward to hearing from Timo and Dawid as well. Best regards, -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Dynamic ad hoc query deployment strategy

2020-11-23 Thread Till Rohrmann
Hi Lalala, I think this approach can work as long as the generated query plan contains the same sub plan for the previous queries as before. Otherwise Flink won't be able to match the state to the operators of the plan. I think Timo and Dawid should know definitely whether this is possible or not.

Re: Dynamic ad hoc query deployment strategy

2020-11-23 Thread lalala
Hi Kostas, Yes, that would satisfy my use case as the platform is always future-oriented. Any arbitrary query is executed on the latest data. >From your comment, I understand that even the session mode does not optimize our readers. I wish Flink could support arbitrary job submission and graph ge

Re: Dynamic ad hoc query deployment strategy

2020-11-23 Thread Kostas Kloudas
Hi Lalala, Even in session mode, the jobgraph is created before the job is executed. So all the above hold. Although I am not super familiar with the catalogs, what you want is that two or more jobs share the same readers of a source. This is not done automatically in DataStream or DataSet and I a

Re: Dynamic ad hoc query deployment strategy

2020-11-20 Thread lalala
Hi Kostas, Thank you for your response. Is what you are saying valid for session mode? I can submit my jobs to the existing Flink session, will they be able to share the sources? We do register our Kafka tables to `GenericInMemoryCatalog`, and the documentation says `The GenericInMemoryCatalog i

Re: Dynamic ad hoc query deployment strategy

2020-11-20 Thread Kostas Kloudas
I am also cc'ing Timo to see if he has anything more to add on this. Cheers, Kostas On Thu, Nov 19, 2020 at 9:41 PM Kostas Kloudas wrote: > > Hi, > > Thanks for reaching out! > > First of all, I would like to point out that an interesting > alternative to the per-job cluster could be running you

Re: Dynamic ad hoc query deployment strategy

2020-11-19 Thread Kostas Kloudas
Hi, Thanks for reaching out! First of all, I would like to point out that an interesting alternative to the per-job cluster could be running your jobs in application mode [1]. Given that you want to run arbitrary SQL queries, I do not think you can "share" across queries the part of the job grap

Dynamic ad hoc query deployment strategy

2020-11-15 Thread lalala
Hi all, I would like to consult with you regarding deployment strategies. We have +250 Kafka topics that we want users of the platform to submit SQL queries that will run indefinitely. We have a query parsers to extract topic names from user queries, and the application locally creates Kafka tabl