Max,
That’s exactly what I did and I am still getting the same error message as
before (ValueError: pipeline_options is notset)
-Buvana
On 5/29/20, 4:25 AM, "Maximilian Michels" wrote:
Please remove any instantiation of HadoopFileSystem; it is not meant to
be used directly. It will b
Please remove any instantiation of HadoopFileSystem; it is not meant to
be used directly. It will be created automatically when you use the
'hdfs://' schema in your path.
It is sufficient to only set the PipelineOptions for HDFS, e.g.:
> options = PipelineOptions([
> "--hdfs_host=XX.143",
Hi Max,
I incorporated your suggestion and I still get the same error message.
-Buvana
On 5/28/20, 7:35 AM, "Maximilian Michels" wrote:
The configuration looks good but the HDFS file system implementation is
not intended to be used directly.
Instead of:
> lines = p | 'ReadMy
The configuration looks good but the HDFS file system implementation is
not intended to be used directly.
Instead of:
> lines = p | 'ReadMyFile' >> beam.Create(hdfs_client.open(input_file_hdfs))
Use:
> lines = p | 'ReadMyFile' >> beam.io.ReadFromText(input_file_hdfs)
Best,
Max
On 28.05.20 0
Further to that:
At the Flink Job/Task Manager end, I configured/setup the following:
HADOOP_CONF_DIR
HADOOP_USER_NAME
hadoop jars copied under $FLINK_HOME/lib
And made sure a pyflink script is able to read and write into the hdfs system.
Should I setup / configure anything at the Job Server?
I