Re: Issues with Flink Batch and Hadoop dependency

2020-08-29 Thread Dan Hill
I was able to get a basic version to work by including a bunch of hadoop and s3 dependencies in the job jar and hacking in some hadoop config values. It's probably not optimal but it looks like I'm unblocked. On Fri, Aug 28, 2020 at 12:11 PM Dan Hill wrote: > I'm assuming I have a simple, commo

Re: Flink not outputting windows before all data is seen

2020-08-29 Thread David Anderson
Teodor, This is happening because of the way that readTextFile works when it is executing in parallel, which is to divide the input file into a bunch of splits, which are consumed in parallel. This is making it so that the watermark isn't able to move forward until much or perhaps all of the file

Re: FileSystemHaServices and BlobStore

2020-08-29 Thread Alexey Trenikhun
Did test with streaming job and FileSystemHaService using VoidBlobStore (no HA Blob), looks like job was able to recover from both JM restart and TM restart. Any idea in what use cases HA Blob is needed? Thanks, Alexey From: Alexey Trenikhun Sent: Friday, August

Flink not outputting windows before all data is seen

2020-08-29 Thread Teodor Spæren
Hey! Second time posting to a mailing lists, lets hope I'm doing this correctly :) My usecase is to take data from the mediawiki dumps and stream it into Flink via the `readTextFile` method. The dumps are TSV files with an event per line, each event have a timestamp and a type. I want to use

Re: PyFlink cluster runtime issue

2020-08-29 Thread Manas Kale
Ok, thank you! On Sat, 29 Aug, 2020, 4:07 pm Xingbo Huang, wrote: > Hi Manas, > > We can't submit a pyflink job through flink web currently. The only way > currently to submit a pyFlink job is through the command line. > > Best, > Xingbo > > Manas Kale 于2020年8月29日周六 下午12:51写道: > >> Hi Xingbo, >

Re: Flink OnCheckpointRollingPolicy streamingfilesink

2020-08-29 Thread Andrey Zagrebin
Hi Vijay, I would apply the same judgement. It is latency vs throughput vs spent resources vs practical need. The more concurrent checkpoints your system is capable of handling, the better end-to-end result latency you will observe and see computation results more frequently. On the other hand yo

Re: PyFlink cluster runtime issue

2020-08-29 Thread Xingbo Huang
Hi Manas, We can't submit a pyflink job through flink web currently. The only way currently to submit a pyFlink job is through the command line. Best, Xingbo Manas Kale 于2020年8月29日周六 下午12:51写道: > Hi Xingbo, > Thanks, that worked. Just to make sure, the only way currently to submit a > pyFlink