Hi Yik San,

Command line option `-pyarch` could be used to specify archive files such as 
Python virtual environment, ML model, data file, etc.

So for resources.zip, -pyarch makes more sense than -pyfs.

Regards,
Dian

> 2021年4月27日 下午5:14,Yik San Chan <evan.chanyik...@gmail.com> 写道:
> 
> Hi Dian,
> 
> Thank you! That solves my question. By the way, for my use case, does -pyarch 
> make more sense than -pyfs?
> 
> Best,
> Yik San
> 
> On Tue, Apr 27, 2021 at 4:52 PM Dian Fu <dian0511...@gmail.com 
> <mailto:dian0511...@gmail.com>> wrote:
> Hi Yik San,
> 
> Could you try `pd.read_csv(‘resources.zip/resources/crypt.csv’, xxx)`?
> 
> Regards,
> Dian
> 
>> 2021年4月27日 下午4:39,Yik San Chan <evan.chanyik...@gmail.com 
>> <mailto:evan.chanyik...@gmail.com>> 写道:
>> 
>> Hi,
>> 
>> My UDF has the dependency to a resource file named crypt.csv that is located 
>> in resources/ directory.
>> 
>> ```python
>> # udf_use_resource.py
>> @udf(input_types=[DataTypes.STRING()], result_type=DataTypes.STRING())
>> def decrypt(s):
>>     import pandas as pd
>>     d = pd.read_csv('resources/crypt.csv', header=None, index_col=0, 
>> squeeze=True).to_dict()
>>     return d.get(s, "unknown")
>> ```
>> 
>> I run the job in local mode (i.e., python udf_use_resource.py) without any 
>> problem. However, when I try to run it with 
>> `~/softwares/flink-1.12.0/bin/flink run -d -pyexec 
>> /usr/local/anaconda3/envs/featflow-ml-env/bin/python -pyarch resources.zip 
>> -py udf_use_resource.py` on my local cluster, it complains:
>> 
>> FileNotFoundError: [Errno 2] File b'resources/crypt.csv' does not exist: 
>> b'resources/crypt.csv'
>> 
>> The resources.zip is zipped from the resources directory. I wonder: where do 
>> I go wrong?
>> 
>> Note: udf_use_resource.py and resources/crypt.csv can be found in 
>> https://github.com/YikSanChan/pyflink-quickstart/tree/36bfab4ff830f57d3f23f285c7c5499a03385b71
>>  
>> <https://github.com/YikSanChan/pyflink-quickstart/tree/36bfab4ff830f57d3f23f285c7c5499a03385b71>.
>> 
>> Thanks!
>> 
>> Best,
>> Yik San
> 

Reply via email to