Working with a 3 node cluster. Started via YARN.
If I go to port 8080 I see the Tomcat start screen. 8088 has the Yarn
screen.
Didn't see anything obvious to start the UI in the bin folder.
Flink 1.1.1 is running on AWS / EMR. 3 boxes - total 24 cores and 90Gb of
RAM.
Job is submitted via yarn.
Topology:
read csv files from SQS -> parse files by line and create object for each
line -> pass through 'KeySelector' to pair entries (by hash) over 60 second
window -> write original and
Take the above topology and send the resultant DataStream to .print().
Assume the Reduce function changes a character to it's uppercase equivalent.
Assume this whole stream makes it within the span of a single Window.
Send it a stream like a,b,c,d,e,f,g,h,c,e,g.
What will be output?
a) a,b,C,
Topology snip:
datastream =
some_stream.keyBy(keySelector).timeWindow(Time.seconds(60)).reduce(new
some_KeyReduce());
If I have a KeySelector that's pretty 'loose' (IE lots of matches) the
'some_KeyReduce' function gets hit frequently and some set of values is
printed out via 'datastream.print(