Stopping flink application with /jobs/:jobid/savepoints or /jobs/:jobid/stop

2020-06-08 Thread Deshpande, Omkar
Hello, When I try to stop the job with /jobs/:jobid/stop REST endpoint, the state gets drained, even if I pass {"drain":false} in the body of the post request. Is the value of drain flag true b

Re: Stopping a job

2020-06-08 Thread M Singh
Thanks Kostas, Arvid, and Senthil for your help. On Monday, June 8, 2020, 12:47:56 PM EDT, Senthil Kumar wrote: #yiv1043440718 #yiv1043440718 -- _filtered {} _filtered {} _filtered {} _filtered {} _filtered {}#yiv1043440718 #yiv1043440718 p.yiv1043440718MsoNormal, #yiv1043440718 li.yi

Re: Beam/Python/Flink unable to deserialize UnboundedSource

2020-06-08 Thread Pradip Thachile
Another data point folks: I've been able to run my simple test case with Dataflow without issue, but Flink is still problematic. Le dim. 7 juin 2020 à 18:48, Pradip Thachile a écrit : > I've attached below some minimal sample code that reproduces this issue > below. This works perfectly with the

Re: UpsertStreamTableSink for Aggregate Only Query

2020-06-08 Thread Jark Wu
Hi Satyam, I guess your destination database table doesn't have a primary key, right? If this is the case, I think maybe the upcoming 1.11 release with new sink interface (FLIP-95) can better resolve this. In the new sink interface: - the primary key is always defined on Flink SQL DDL - the plann

Re: Flink on yarn : yarn-session understanding

2020-06-08 Thread Xintong Song
Hi Anuj, By "standalone job on yarn", I assume you mean running one job per Flink cluster on Yarn, which is also known as job mode, or per-job mode? I'm asking because Flink has another standalone deployment mode [1], aside from the Yarn deployment mode. 1. The major difference between Flink Appl

Re: Timestamp data type mismatch

2020-06-08 Thread Satyam Shekhar
Thanks Dawid for the explanation. Your suggested approach makes things work. Regards, Satyam On Mon, Jun 8, 2020 at 1:18 AM Dawid Wysakowicz wrote: > Hi Satyam, > > The thing is that Types.SQL_TIMESTAMP uses java.sql.Timestamp and > serializes it as long (millis since epoch) and thus have mill

Re: Stopping a job

2020-06-08 Thread Senthil Kumar
I am just stating this for completeness. When a job is cancelled, Flink sends an Interrupt signal to the Thread running the Source.run method For some reason (unknown to me), this does not happen when a Stop command is issued. We ran into some minor issues because of said behavior. From: Kost

Re: Native K8S not creating TMs

2020-06-08 Thread Bohinski, Kevin
Hi Yang Thanks again for your help so far. I tried your suggestion, still with no luck. Attached are the logs, please let me know if there are more I should send. Best kevin On 2020/06/08 03:02:40, Yang Wang mailto:d...@gmail.com>> wrote: > Hi Kevin,> > > It may because the characters length li

Re: Run command after Batch is finished

2020-06-08 Thread Chesnay Schepler
This goes in the right direction; have a look at org.apache.flink.api.common.io.FinalizeOnMaster, which an OutputFormat can implement to run something on the Master after all subtasks have been closed. On 08/06/2020 17:25, Andrey Zagrebin wrote: Hi Mark, I do not know how you output the resu

Re: Simple stateful polling source

2020-06-08 Thread Chesnay Schepler
Small correction to what I said: Sources have to implement ParallelSourceFunction in order to be run with a higher parallelism. The javadocs for the RichSourceFunction are /somewhat /incorrect, but in a sense also correct. This is because you can have a RichSourceFunction that also implements

Re: Run command after Batch is finished

2020-06-08 Thread Flavio Pompermaier
I usually run some code after env.execute(), it's not elegant but it works (only if you run the code from Cli client, not from REST one) On Mon, Jun 8, 2020 at 5:25 PM Andrey Zagrebin wrote: > Hi Mark, > > I do not know how you output the results in your pipeline. > If you use DataSet#output(Out

Re: Run command after Batch is finished

2020-06-08 Thread Andrey Zagrebin
Hi Mark, I do not know how you output the results in your pipeline. If you use DataSet#output(OutputFormat outputFormat), you could try to extend the format with a custom close method which should be called once the task of the sink batch operator is done in the task manager. I also cc Aljoscha, m

Re: Data Quality Library in Flink

2020-06-08 Thread Andrey Zagrebin
Hi Anuj, I am not familiar with data quality measurement methods and deequ in depth. What you describe looks like monitoring some data metrics. Maybe, there are other community users aware of better solution. Meanwhile, I would recommend to implement the checks a

Internal state and external stores conditional usage advice sought (dynamodb asyncIO)

2020-06-08 Thread orionemail
Hi, Following on from an earlier email my approach has changed but I am still unsure how to best acheive my goal. I have records coming through a kinesis stream into flink: { id: var1: ... } 'id' needs to be replaced with a value from a DB store, or if not present in the DB generate in

Flink on yarn : yarn-session understanding

2020-06-08 Thread aj
I am running some stream jobs that are long-running always. I am currently submitting each job as a standalone job on yarn. 1. I need to understand what is the advantage of using yarn-session and when should I use that. 2. Also, I am not able to access rest API services is it because I am running

Re: Stopping a job

2020-06-08 Thread Kostas Kloudas
What Arvid said is correct. The only thing I have to add is that "stop" allows also exactly-once sinks to push out their buffered data to their final destination (e.g. Filesystem). In other words, it takes into account side-effects, so it guarantees exactly-once end-to-end, assuming that you are us

Re: Stopping a job

2020-06-08 Thread Arvid Heise
It was before I joined the dev team, so the following are kind of speculative: The concept of stoppable functions never really took off as it was a bit of a clumsy approach. There is no fundamental difference between stopping and cancelling on (sub)task level. Indeed if you look in the twitter sou

Re: Flink not restoring from checkpoint when job manager fails even with HA

2020-06-08 Thread Yun Tang
Hi Bhaskar By default, you will get a new job id. There existed some hack and hidden method to set the job id but is not meant to be used by the user Best Yun Tang From: Vijay Bhaskar Sent: Monday, June 8, 2020 15:03 To: Yun Tang Cc: Kathula, Sandeep ; user@fl

Re: Timestamp data type mismatch

2020-06-08 Thread Dawid Wysakowicz
Hi Satyam, The thing is that Types.SQL_TIMESTAMP uses java.sql.Timestamp and serializes it as long (millis since epoch) and thus have milliseconds precision. The default precision for a DataTypes.TIMESTAMP is 6 and the default bridging class is LocalDataTime. It should work if you return DataType

Timestamp data type mismatch

2020-06-08 Thread Satyam Shekhar
Hello, I am running into an issue while trying to create a TableSource with rowtime attribute. I have configured TableSource to return produced type of Row(DataTypes.BIGINT, DataTypes.TIMESTAMP) via DataType TableSource::getProducedDataType(). The returned DataStream has a flatmap operator that im

Re: UpsertStreamTableSink for Aggregate Only Query

2020-06-08 Thread Satyam Shekhar
Hi Jark, I wish to atomically update the destination with remove-insert. To pick that strategy, I need some "hint" from Flink that the output is a global aggregation with no grouping key, and that appends should overwrite the previous value. I am also exploring handling the issue in the upstream

Re: Flink not restoring from checkpoint when job manager fails even with HA

2020-06-08 Thread Vijay Bhaskar
Hi Yun I'll put my question in other way: 1) First time I deployed my job and got an ID from flink, let's say "abcdef" ( Somehow i remembered ID given to me by flink, by storing in other persistence store) 2) Next time my job failed. I use my stored Job ID, ("abcdef") to retrieve the retained che