Hi Dian,

Thank you very much, that's very helpful.  I'm seeing a couple of errors when I 
try to run the example though (Python 3.8 on Mac OS).


  1.  I create a fresh Python virtual env: `python -m venv .venv`
  2.  `source .venv/bin/activate`
  3.  When I tried to configure the project by running `python setup.py 
install` I got errors about Cython not being installed even though it was.  I 
then just had to do a `pip install apache-flink==1.14.4` to install the 
requirements and be able to move forward. Not sure what the issue here is.
  4.  $ python test_table_api.py TableTests.test_scalar_function

Using %s as FLINK_HOME... 
/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink

Skipped download 
/Users/john/PycharmProjects/pyflink-faq/testing/flink-python_2.11-1.14.4-tests.jar
 since it already exists.

The flink-python jar is not found in the opt folder of the FLINK_HOME: 
/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink

/Users/john/PycharmProjects/pyflink-faq/testing/flink-python_2.11-1.14.4-tests.jar

Error: Could not find or load main class 
org.apache.flink.client.python.PythonGatewayServer

Caused by: java.lang.ClassNotFoundException: 
org.apache.flink.client.python.PythonGatewayServer

E/usr/local/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py:704:
 ResourceWarning: unclosed file <_io.BufferedWriter name=4>

  outcome.errors.clear()

ResourceWarning: Enable tracemalloc to get the object allocation traceback


======================================================================

ERROR: test_scalar_function (__main__.TableTests)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/Users/john/PycharmProjects/pyflink-faq/testing/test_utils.py", line 
123, in setUp

    self.t_env = 
TableEnvironment.create(EnvironmentSettings.in_streaming_mode())

  File 
"/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink/table/environment_settings.py",
 line 267, in in_streaming_mode

    get_gateway().jvm.EnvironmentSettings.inStreamingMode())

  File 
"/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink/java_gateway.py",
 line 62, in get_gateway

    _gateway = launch_gateway()

  File 
"/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink/java_gateway.py",
 line 112, in launch_gateway

    raise Exception("Java gateway process exited before sending its port 
number")

Exception: Java gateway process exited before sending its port number


----------------------------------------------------------------------

Ran 1 test in 0.333s


FAILED (errors=1)


  5.  I then added `("org.apache.flink", "flink-python_2.11", "1.14.4", None)` 
to the testing_jars list so that the regular Flink jar would be downloaded, 
created an `opt` directory to the FLINK_HOME directory and copied into it the 
regular Flink jar.
  6.
  7.  $ python test_table_api.py TableTests.test_scalar_function

Using %s as FLINK_HOME... 
/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink

Skipped download 
/Users/john/PycharmProjects/pyflink-faq/testing/flink-python_2.11-1.14.4-tests.jar
 since it already exists.

Skipped download 
/Users/john/PycharmProjects/pyflink-faq/testing/flink-python_2.11-1.14.4.jar 
since it already exists.

/Users/john/PycharmProjects/pyflink-faq/testing/flink-python_2.11-1.14.4.jar:/Users/john/PycharmProjects/pyflink-faq/testing/flink-python_2.11-1.14.4-tests.jar

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/slf4j/LoggerFactory

at 
org.apache.flink.client.python.PythonEnvUtils.<clinit>(PythonEnvUtils.java:77)

at 
org.apache.flink.client.python.PythonGatewayServer.main(PythonGatewayServer.java:46)

Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory

at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)

at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)

at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)

... 2 more

E/usr/local/Cellar/python@3.8/3.8.13/Frameworks/Python.framework/Versions/3.8/lib/python3.8/unittest/case.py:704:
 ResourceWarning: unclosed file <_io.BufferedWriter name=4>

  outcome.errors.clear()

ResourceWarning: Enable tracemalloc to get the object allocation traceback


======================================================================

ERROR: test_scalar_function (__main__.TableTests)

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/Users/john/PycharmProjects/pyflink-faq/testing/test_utils.py", line 
131, in setUp

    self.t_env = 
TableEnvironment.create(EnvironmentSettings.in_streaming_mode())

  File 
"/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink/table/environment_settings.py",
 line 267, in in_streaming_mode

    get_gateway().jvm.EnvironmentSettings.inStreamingMode())

  File 
"/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink/java_gateway.py",
 line 62, in get_gateway

    _gateway = launch_gateway()

  File 
"/Users/john/PycharmProjects/pyflink-faq/.venv/lib/python3.8/site-packages/apache_flink-1.14.4-py3.8-macosx-11-x86_64.egg/pyflink/java_gateway.py",
 line 112, in launch_gateway

    raise Exception("Java gateway process exited before sending its port 
number")

Exception: Java gateway process exited before sending its port number


----------------------------------------------------------------------

Ran 1 test in 0.344s


FAILED (errors=1)

  8.
Now it looks like the code needs all of the transitive dependencies on the 
classpath?

Have you managed to get your example tests to run in a completely clean virtual 
environment? It looks like if it's working on your computer that your computer 
perhaps has Java and Python dependencies already downloaded into particular 
locations.

Many thanks,

John

________________________________
From: Dian Fu <dian0511...@gmail.com>
Sent: 24 April 2022 06:21
To: John Tipper <john_tip...@hotmail.com>
Cc: user@flink.apache.org <user@flink.apache.org>
Subject: Re: Unit testing PyFlink SQL project

Hi John,

I have written an example on how to write unit tests of Flink functionalities 
with PyFlink in [1]. Hope it is helpful for you. Feel free to let me know if 
there are any problems.

Regards,
Dian

[1] https://github.com/dianfu/pyflink-faq/tree/main/testing

On Sun, Apr 24, 2022 at 9:25 AM Dian Fu 
<dian0511...@gmail.com<mailto:dian0511...@gmail.com>> wrote:
Hi John,

>> I don't know how to fix this. I've tried adding `flink-table-planner` and 
>> `flink-table-planner-blink` dependencies with `<type>test-jar</type>` to my 
>> dummy pom.xml, but it still fails.
What's the failure after doing this? The flink-table-planner*-tests.jar should 
be available in maven repository[1].

>> This is starting to feel like a real pain to do something that should be 
>> trivial: basic TDD of a PyFlink project.  Is there a real-world example of a 
>> Python project that shows how to set up a testing environment for unit 
>> testing SQL with PyFlink?
I'm not aware of such a project, however I agree that this may be a very 
important aspect which should be improved. I will look into this.

Regards,
Dian

[1] 
https://repo1.maven.org/maven2/org/apache/flink/flink-table-planner_2.11/1.13.6/


On Sun, Apr 24, 2022 at 4:44 AM John Tipper 
<john_tip...@hotmail.com<mailto:john_tip...@hotmail.com>> wrote:
Hi all,

Is there an example of a self-contained repository showing how to perform SQL 
unit testing of PyFlink (specifically 1.13.x if possible)?  I have cross-posted 
the question to Stack Overflow here: 
https://stackoverflow.com/questions/71983434/is-there-an-example-of-pyflink-sql-unit-testing-in-a-self-contained-repo


There is a related SO question 
(https://stackoverflow.com/questions/69937520/pyflink-sql-local-test), where it 
is suggested to use some of the tests from PyFlink itself.  The issue I'm 
running into is that the PyFlink repo assumes that a bunch of things are on the 
Java classpath and that some Python utility classes are available (they're not 
distributed via PyPi apache-flink).

I have done the following:


  1.  Copied `test_case_utils.py` and `source_sink_utils.py` from PyFlink 
(https://github.com/apache/flink/tree/f8172cdbbc27344896d961be4b0b9cdbf000b5cd/flink-python/pyflink/testing)
 into my project.
  2.  Copy an example unit test 
(https://github.com/apache/flink/blob/f8172cdbbc27344896d961be4b0b9cdbf000b5cd/flink-python/pyflink/table/tests/test_sql.py#L39)
 as suggested by the related SO question.
  3.

When I try to run the test, I get an error because the test case cannot 
determine what version of Avro jars to download (`download_apache_avro()` 
fails, because pyflink_gateway_server.py tries to evaluate the value of 
`avro.version` by running `mvn help:evaluate -Dexpression=avro.version`)

I then added a dummy `pom.xml` defining a Maven property of `avro.version` 
(with a value of `1.10.0`) and my unit test case is loaded.

I now get a new error and my test is skipped:

    'flink-table-planner*-tests.jar' is not available. Will skip the related 
tests.

I don't know how to fix this. I've tried adding `flink-table-planner` and 
`flink-table-planner-blink` dependencies with `<type>test-jar</type>` to my 
dummy pom.xml, but it still fails.

This is starting to feel like a real pain to do something that should be 
trivial: basic TDD of a PyFlink project.  Is there a real-world example of a 
Python project that shows how to set up a testing environment for unit testing 
SQL with PyFlink?

Many thanks,

John

Reply via email to