I was able to get a basic version to work by including a bunch of hadoop
and s3 dependencies in the job jar and hacking in some hadoop config
values. It's probably not optimal but it looks like I'm unblocked.
On Fri, Aug 28, 2020 at 12:11 PM Dan Hill wrote:
> I'm assuming I have a simple, commo
Teodor,
This is happening because of the way that readTextFile works when it is
executing in parallel, which is to divide the input file into a bunch of
splits, which are consumed in parallel. This is making it so that the
watermark isn't able to move forward until much or perhaps all of the file
Did test with streaming job and FileSystemHaService using VoidBlobStore (no HA
Blob), looks like job was able to recover from both JM restart and TM restart.
Any idea in what use cases HA Blob is needed?
Thanks,
Alexey
From: Alexey Trenikhun
Sent: Friday, August
Hey!
Second time posting to a mailing lists, lets hope I'm doing this
correctly :)
My usecase is to take data from the mediawiki dumps and stream it into
Flink via the `readTextFile` method. The dumps are TSV files with an
event per line, each event have a timestamp and a type. I want to use
Ok, thank you!
On Sat, 29 Aug, 2020, 4:07 pm Xingbo Huang, wrote:
> Hi Manas,
>
> We can't submit a pyflink job through flink web currently. The only way
> currently to submit a pyFlink job is through the command line.
>
> Best,
> Xingbo
>
> Manas Kale 于2020年8月29日周六 下午12:51写道:
>
>> Hi Xingbo,
>
Hi Vijay,
I would apply the same judgement. It is latency vs throughput vs spent
resources vs practical need.
The more concurrent checkpoints your system is capable of handling, the
better end-to-end result latency you will observe and see computation
results more frequently.
On the other hand yo
Hi Manas,
We can't submit a pyflink job through flink web currently. The only way
currently to submit a pyFlink job is through the command line.
Best,
Xingbo
Manas Kale 于2020年8月29日周六 下午12:51写道:
> Hi Xingbo,
> Thanks, that worked. Just to make sure, the only way currently to submit a
> pyFlink