Great, thanks!

Kind regards,
Levan Huyen

On Fri, 21 Oct 2022 at 00:53, Biao Geng <biaoge...@gmail.com> wrote:

> You are right.
> It contains the python package `pyflink` and some dependencies like py4j
> and cloudpickle but does not contain all relevant dependencies(e.g.
> `google.protobuf` as the error log shows, which I also reproduce in my own
> machine).
>
> Best,
> Biao Geng
>
> Levan Huyen <lvhu...@gmail.com> 于2022年10月20日周四 19:53写道:
>
>> Thanks Biao.
>>
>> May I ask one more question: does the binary package on Apache site (e.g:
>> https://archive.apache.org/dist/flink/flink-1.15.2) contain the python
>> package `pyflink` and its dependencies? I guess the answer is no.
>>
>> Thanks and regards,
>> Levan Huyen
>>
>> On Thu, 20 Oct 2022 at 18:13, Biao Geng <biaoge...@gmail.com> wrote:
>>
>>> Hi Levan,
>>> Great to hear that your issue is resolved!
>>> For the follow-up question, I am not quite familiar with AWS EMR's
>>> configuration for flink but due to the error you attached, it looks like
>>> that pyflink may not ship some 'Google' dependencies in the Flink binary
>>> zip file and as a result, it will try to find it in your python
>>> environment. cc @hxbks...@gmail.com
>>> For now, to manage the complex python dependencies, the typical usage of
>>> pyflink in multiple node clusters for production is to create your venv and
>>> use it in your `flink run` command or in the python code. You can refer to
>>> this doc
>>> <https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/python/faq/#preparing-python-virtual-environment>
>>> for details.
>>>
>>> Best,
>>> Biao Geng
>>>
>>> Levan Huyen <lvhu...@gmail.com> 于2022年10月20日周四 14:11写道:
>>>
>>>> Hi Biao,
>>>>
>>>> Thanks for your help. That solved my issue. It turned out that in
>>>> setup1 (in EMR), I got apache-flink installed, but the package (and its
>>>> dependencies) are not in the directory `/usr/lib/python3.7/site-packages`
>>>> (corresponding to the python binary in `/usr/bin/python3`). For some
>>>> reason, the packages are in the current user's location (`~/.local/...)
>>>> which Flink did not look at.
>>>>
>>>> BTW, is there any way to use the pyflink shipped with the Flink binary
>>>> zip file that I downloaded from Apache's site? On EMR, such package is
>>>> included, and I feel it's awkward to have to install another version using
>>>> `pip install`. It will also be confusing about where to add the
>>>> dependencies jars.
>>>>
>>>> Thanks and regards,
>>>> Levan Huyen
>>>>
>>>>
>>>> On Thu, 20 Oct 2022 at 02:25, Biao Geng <biaoge...@gmail.com> wrote:
>>>>
>>>>> Hi Levan,
>>>>>
>>>>> For your setup1 & 2, it looks like the python environment is not
>>>>> ready. Have you tried python -m pip install apache-flink for the
>>>>> first 2 setups?
>>>>> For your setup3, as you are trying to use `flink run ...` command, it
>>>>> will try to connect to a launched flink cluster but I guess you did not
>>>>> launch the flink cluster. You can do `start-cluster.sh` first to launch a
>>>>> standalone flink cluster and then try the `flink run ...` command.
>>>>> For your setup4, the reason why it works well is that it will use the
>>>>> default mini cluster to run the pyflink job. So even you haven't started a
>>>>> standalone cluster, it can work as well.
>>>>>
>>>>> Best,
>>>>> Biao Geng
>>>>>
>>>>> Levan Huyen <lvhu...@gmail.com> 于2022年10月19日周三 17:07写道:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm new to PyFlink, and I couldn't run a basic example that shipped
>>>>>> with Flink.
>>>>>> This is the command I tried:
>>>>>>
>>>>>> ./bin/flink run -py examples/python/datastream/word_count.py
>>>>>>
>>>>>> Here below are the results I got with different setups:
>>>>>>
>>>>>> 1. On AWS EMR 6.8.0 (Flink 1.15.1):
>>>>>> *Error: No module named 'google'*I tried with the Flink shipped with
>>>>>> EMR, or the binary v1.15.1/v1.15.2 downloaded from Flink site. I got that
>>>>>> same error message in all cases.
>>>>>>
>>>>>> Traceback (most recent call last):
>>>>>>
>>>>>>   File "/usr/lib64/python3.7/runpy.py", line 193, in
>>>>>> _run_module_as_main
>>>>>>
>>>>>>     "__main__", mod_spec)
>>>>>>
>>>>>>   File "/usr/lib64/python3.7/runpy.py", line 85, in _run_code
>>>>>>
>>>>>>     exec(code, run_globals)
>>>>>>
>>>>>>   File
>>>>>> "/tmp/pyflink/622f19cc-6616-45ba-b7cb-20a1462c4c3f/a29eb7a4-fff1-4d98-9b38-56b25f97b70a/word_count.py",
>>>>>> line 134, in <module>
>>>>>>
>>>>>>     word_count(known_args.input, known_args.output)
>>>>>>
>>>>>>   File
>>>>>> "/tmp/pyflink/622f19cc-6616-45ba-b7cb-20a1462c4c3f/a29eb7a4-fff1-4d98-9b38-56b25f97b70a/word_count.py",
>>>>>> line 89, in word_count
>>>>>>
>>>>>>     ds = ds.flat_map(split) \
>>>>>>
>>>>>>   File
>>>>>> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
>>>>>> line 333, in flat_map
>>>>>>
>>>>>>   File
>>>>>> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
>>>>>> line 557, in process
>>>>>>
>>>>>>   File
>>>>>> "/usr/lib/flink/opt/python/pyflink.zip/pyflink/fn_execution/flink_fn_execution_pb2.py",
>>>>>> line 23, in <module>
>>>>>>
>>>>>> ModuleNotFoundError: No module named 'google'
>>>>>>
>>>>>> org.apache.flink.client.program.ProgramAbortException:
>>>>>> java.lang.RuntimeException: Python process exits with code: 1
>>>>>>
>>>>>> 2. On my Mac, without a virtual environment (so `*-pyclientexec
>>>>>> python3*` is included in the run command): got the same error as
>>>>>> with EMR, but there's a stdout line from `*print()*` in the Python
>>>>>> script
>>>>>>
>>>>>>   File
>>>>>> "/Users/lvh/dev/tmp/flink-1.15.2/opt/python/pyflink.zip/pyflink/datastream/data_stream.py",
>>>>>> line 557, in process
>>>>>>
>>>>>>   File "<frozen zipimport>", line 259, in load_module
>>>>>>
>>>>>>   File
>>>>>> "/Users/lvh/dev/tmp/flink-1.15.2/opt/python/pyflink.zip/pyflink/fn_execution/flink_fn_execution_pb2.py",
>>>>>> line 23, in <module>
>>>>>>
>>>>>> ModuleNotFoundError: No module named 'google'
>>>>>>
>>>>>> Executing word_count example with default input data set.
>>>>>>
>>>>>> Use --input to specify file input.
>>>>>>
>>>>>> org.apache.flink.client.program.ProgramAbortException:
>>>>>> java.lang.RuntimeException: Python process exits with code: 1
>>>>>>
>>>>>> 3. On my Mac, with a virtual environment and Python package `
>>>>>> *apache-flink`* installed: Flink tried to connect to localhost:8081
>>>>>> (I don't know why), and failed with 'connection refused'.
>>>>>>
>>>>>> Caused by:
>>>>>> org.apache.flink.util.concurrent.FutureUtils$RetryException: Could not
>>>>>> complete the operation. Number of retries has been exhausted.
>>>>>>
>>>>>> at
>>>>>> org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:395)
>>>>>>
>>>>>> ... 21 more
>>>>>>
>>>>>> Caused by: java.util.concurrent.CompletionException:
>>>>>> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException:
>>>>>> Connection refused: localhost/127.0.0.1:8081
>>>>>>
>>>>>>  4. If I run that same example job using Python: `*python
>>>>>> word_count.py*` then it runs well.
>>>>>>
>>>>>> I tried with both v1.15.2 and 1.15.1, Python 3.7 & 3.8, and got the
>>>>>> same result.
>>>>>>
>>>>>> Could someone please help?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>

Reply via email to