[DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-04 Thread Kurt Young
Hi all, I'd like to bring up a discussion about removing registration of TableSource and TableSink in TableEnvironment as well as in ConnectTableDescriptor. The affected method would be: TableEnvironment::registerTableSource TableEnvironment::fromTableSource TableEnvironment::registerTableSink Co

Re: Task-manager kubernetes pods take a long time to terminate

2020-02-04 Thread Li Peng
My yml files follow most of the instructions here: http://shzhangji.com/blog/2019/08/24/deploy-flink-job-cluster-on-kubernetes/ What command did you use to delete the deployments? I use : helm --tiller-namespace prod delete --purge my-deployment I noticed that for environments without much data

Re: Task-manager kubernetes pods take a long time to terminate

2020-02-04 Thread Yang Wang
So do you start your Flink cluster on K8s with the yaml here[1]? I have tested multiple times, and it always works well. If not, could you share your yaml file with me? [1]. https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/kubernetes.html#session-cluster-resource-definitions

Re: Task-manager kubernetes pods take a long time to terminate

2020-02-04 Thread Li Peng
Hey Yang, The jobmanager and taskmanagers are all part of the same deployment, when I delete the deployment all the pods are told to be terminated. The status of the taskmanager is "terminating", and it waits until the taskmanager times out in that error loop before it actually terminates. Thank

Re: Best approach for recalculating statistics based on amended or deleted events?

2020-02-04 Thread Stephen Young
Hi Timo, Thanks for replying to me so quickly! We could do it with insert-only rows. When you say flags in the data do you mean a field with a name like 'retracts' and then the value of that field is the id of the event/row we want to retract? How would that be possible with Flink? Thanks! O

Re: Best approach for recalculating statistics based on amended or deleted events?

2020-02-04 Thread Timo Walther
Hi Stephan, the use cases you are describing sound like a perfect fit to Flink. Internally, Flink deals with insertions and deletions that are flowing through the system and can update chained aggregations and complex queries. The only bigger limitation at the moment is that we only support s

Best approach for recalculating statistics based on amended or deleted events?

2020-02-04 Thread Stephen Young
I am currently looking into how Flink can support a live data collection platform. We want to collect certain data in real-time. This data will be sent to Kafka and we want to use Flink to calculate statistics and derived events from it. An important thing we need to be able to handle is amendm

Re: DisableGenericTypes is not compatible with Kafka

2020-02-04 Thread Aljoscha Krettek
Unfortunately, the fact that the Kafka Sources use Kryo for state serialization is a very early design misstep that we cannot get rid of for now. We will get rid of that when the new source interface lands ([1]) and when we have a new Kafka Source based on that. As a workaround, we should chan

Re: Updating multiple database tables

2020-02-04 Thread Dylan Adams
Jason, I ended up using PostgreSQL’s writable CTEs. The target tables had database-generated SERIAL primary keys, so it seemed like the easiest way to keep the changes atomic. Regards, Dylan On Thu, Jan 30, 2020 at 14:15 Jason Sommer wrote: > Hi Dylan, > > I have a similar use case of saving u

Re: time column used by timer

2020-02-04 Thread Timo Walther
Hi, timestamps and watermarks are attached to every stream record in the runtime. After assignTimestampsAndWatermarks() extracted them, Flink handles those attributes internally. For example, it doesn't matter which class you are emitting in a flatMap function, the runtime will set the times

Re: Question: Determining Total Recovery Time

2020-02-04 Thread Timo Walther
Hi Morgan, as far as I know this is not possible mostly because measuring "till the point when the system catches up to the last message" is very pipeline/connector dependent. Some sources might need to read from the very beginning, some just continue from the latest checkpointed offset. Mea

Re: Flink Dynamodb as sink

2020-02-04 Thread Timo Walther
Hi Hemant, maybe this thread from last year could also help you: http://mail-archives.apache.org/mod_mbox/flink-user/201903.mbox/%3c2df93e1c-ae46-47ca-9c62-0d26b2b3d...@gmail.com%3E Someone also proposes a skeleton of the code there. Regards, Timo On 04.02.20 08:10, hemant singh wrote: Than