All jobs going through the web-submission are run in detached mode for
technical reasons (blocking of threads, and information having to be
transported back to the JobManager for things like collect()).
You unfortunately cannot run non-detached/attached/blocking jobs via the
web submission, which includes the WordCount example because it uses
specific methods (the ones mentioned in the exception; collect, print,
printToErr, count).
In other words, your setup appears to be fine correctly, you are just
trying to do something that is not supported.
On 1/5/2021 4:07 PM, Adam Roberts wrote:
Hey everyone, I've got an awesome looking Flink cluster set up
withweb.submit.enable=true, and plenty of bash for handling jar upload
and then submission to a JobManager - all good so far.
Unfortunately, when I try to submit the classic WordCount example, I
get a massive error with the jist of it being:
/"Job was submitted in detached mode. Results of job execution, such
as accumulators, runtime, etc. are not available. Please make sure
your program doesn't call an eager execution function [collect, print,
printToErr, count]."/
*So, how do I run it *not* in detached mode using curl please?*
I'm intentionally not using the Flink CLI because I am using an nginx
with auth proxy set up - so I'm doing everything with curl, in a bash
script - (so, two requests - one to upload the jar, then I get the ID
from the response, and then submit the job with that ID).
At
https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/rest_api.html if
you ctrl-f for /run, there's nothing obvious that indicates how I can
run in blocking mode - the biggest clue I've got is `programArg`. So
I'm wondering if I can provide that somehow.
For those who prefer code:
/curl ${auth_options} ${self_signed_flag} ${ca_cert_flag} -X POST -H
"Content-Type:
application/json"https://${JOBMANAGER}/jars/${uploaded_jar_string}/run$programArgsToUse
<https://%24%7Bjobmanager%7D/jars/$%7Buploaded_jar_string%7D/run$programArgsToUse>/
Whereby/programArgsToUse/is user args, and I'm cool with them being
query parameters for now - I think.
I'm passing them on the end with:
if [[ ! -z $program_args ]] ; then
programArgsToUse="?programArg=$program_args"
fi
so my eventual curl looks like this. But, I'm really just guessing
what the detached argument is...
curl --cacert /etc/ssl/tester/certs/ca.crt -X POST -H 'Content-Type:
application/json'
'https://tester-minimal-tls-sample-jobmanager:8081/jars/fdc7684f-323d-49fa-a60a-96683d953be8_WordCount.jar/run?programArg=detached=false'
(obviously, what's at the end looks really wrong, but IDK what to use)
The only mention of "detach" I see documented is at
https://ci.apache.org/projects/flink/flink-docs-stable/deployment/config.html,
and that says by default it's not in detached mode for execution. If
there are any better docs or examples, please send them my way - or if
you've spotted me being just plain silly with my bash, that would be
fantastic to point out.
Thanks a lot in advance, cheers, happy to share any more
code/background if need be
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with
number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU