Re: Flink Runner with HDFS

2020-05-29 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Max, That’s exactly what I did and I am still getting the same error message as before (ValueError: pipeline_options is notset) -Buvana On 5/29/20, 4:25 AM, "Maximilian Michels" wrote: Please remove any instantiation of HadoopFileSystem; it is not meant to be used directly. It will b

Re: Flink Runner with HDFS

2020-05-29 Thread Maximilian Michels
Please remove any instantiation of HadoopFileSystem; it is not meant to be used directly. It will be created automatically when you use the 'hdfs://' schema in your path. It is sufficient to only set the PipelineOptions for HDFS, e.g.: > options = PipelineOptions([ > "--hdfs_host=XX.143",

Re: Flink Runner with HDFS

2020-05-28 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Hi Max, I incorporated your suggestion and I still get the same error message. -Buvana On 5/28/20, 7:35 AM, "Maximilian Michels" wrote: The configuration looks good but the HDFS file system implementation is not intended to be used directly. Instead of: > lines = p | 'ReadMy

Re: Flink Runner with HDFS

2020-05-28 Thread Maximilian Michels
The configuration looks good but the HDFS file system implementation is not intended to be used directly. Instead of: > lines = p | 'ReadMyFile' >> beam.Create(hdfs_client.open(input_file_hdfs)) Use: > lines = p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs) Best, Max On 28.05.20 0

Re: Flink Runner with HDFS

2020-05-27 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Further to that: At the Flink Job/Task Manager end, I configured/setup the following: HADOOP_CONF_DIR HADOOP_USER_NAME hadoop jars copied under $FLINK_HOME/lib And made sure a pyflink script is able to read and write into the hdfs system. Should I setup / configure anything at the Job Server? I