Re: [ANNOUNCE] Beam 2.21.0 Released

2020-05-29 Thread Maximilian Michels
Thanks Kyle! On 28.05.20 13:16, Kyle Weaver wrote: > The Apache Beam team is pleased to announce the release of version 2.21.0. > > Apache Beam is an open source unified programming model to define and > execute data processing pipelines, including ETL, batch and stream > (continuous) processing.

Re: Flink Runner with HDFS

2020-05-29 Thread Maximilian Michels
Please remove any instantiation of HadoopFileSystem; it is not meant to be used directly. It will be created automatically when you use the 'hdfs://' schema in your path. It is sufficient to only set the PipelineOptions for HDFS, e.g.: > options = PipelineOptions([ > "--hdfs_host=XX.143",

Read a file => process => write to multiple files

2020-05-29 Thread OrielResearch Eila Arich-Landkof
Hi all, I am looking for a way to read a large file and generate the following 3 files: 1. extract header 2. extract column #1 from all lines 3. extract column # 2 from all files I use DoFn to extract the values. I am looking for a way to redirect the output to three different files? My thought

Re: Read a file => process => write to multiple files

2020-05-29 Thread Reuven Lax
Which language are you using? On Fri, May 29, 2020, 6:03 AM OrielResearch Eila Arich-Landkof < e...@orielresearch.org> wrote: > Hi all, > > I am looking for a way to read a large file and generate the following 3 > files: > 1. extract header > 2. extract column #1 from all lines > 3. extract col

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-29 Thread Alexey Romanenko
Yes, I did run only Java pipeline with Portable Runner and there is the same error. Also, I did the same (without cross-language component) against Beam 2.19 and 2.20. It works fine against Beam 2.19 (as expected, since I tested it already before) and fails with kind the same error against Bea

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-29 Thread Kyle Weaver
Alexey, can you try adding --experiments=beam_fn_api to your pipeline options? We add the option automatically in Python [1] but we don't in Java. I filed BEAM-10151 [2] to document this workflow. Alexey, perhaps you can help with that. [1] https://github.com/apache/beam/blob/a5b2046b10bebc59c5bd

Re: [ANNOUNCE] Beam 2.21.0 Released

2020-05-29 Thread Alexey Romanenko

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-29 Thread Alexey Romanenko
Many thanks! It helped to avoid the error. I saw this option in the xlang tests before but I didn’t add it since I was confused because of the name =) Also, I think we need to added “—sdk_location=container” for Expansion Service Finally, I've managed to only Java and xlang pipeline (with Python

Worker pool question

2020-05-29 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Hello, I came across the following in SDK Harness Documentation page: https://beam.apache.org/documentation/runtime/sdk-harness-config/ environment_type: EXTERNAL: User code will be dispatched to an external service. For example, one can start an external service for Python workers by running do

Re: Flink Runner with HDFS

2020-05-29 Thread Ramanan, Buvana (Nokia - US/Murray Hill)
Max, That’s exactly what I did and I am still getting the same error message as before (ValueError: pipeline_options is notset) -Buvana On 5/29/20, 4:25 AM, "Maximilian Michels" wrote: Please remove any instantiation of HadoopFileSystem; it is not meant to be used directly. It will b

Re: Cross-Language pipeline fails with PROCESS SDK Harness

2020-05-29 Thread Kyle Weaver
That's probably a problem with your worker. You'll need to get additional logs to debug (see https://jira.apache.org/jira/browse/BEAM-8278) On Fri, May 29, 2020 at 12:48 PM Alexey Romanenko wrote: > Many thanks! It helped to avoid the error. I saw this option in the xlang > tests before but I di

Re: Worker pool question

2020-05-29 Thread Kyle Weaver
> Does this mean, as an end user of Beam, I can start a worker pool and have my Pipeline executed by this pool of workers (or, are these options strictly internal)? What should be the runner value in PiupelineOptions in that case? Not exactly. The runner and Beam workers work together to execute y

Re: [ANNOUNCE] Beam 2.21.0 Released

2020-05-29 Thread Aizhamal Nurmamat kyzy
Thank you, Kyle! On Fri, May 29, 2020 at 8:33 AM Alexey Romanenko wrote: > Thanks Kyle! > > On 28 May 2020, at 13:16, Kyle Weaver wrote: > > The Apache Beam team is pleased to announce the release of version 2.21.0. > > Apache Beam is an open source unified programming model to define and > exe

Re: Issue while submitting python beam pipeline on flink - local

2020-05-29 Thread Ashish Raghav
Hello Kyle, reply below. Also, what is the stack to run this as production setup on gcloud?? I can try that setup to see if this works. Get Outlook for Android From: Kyle Weaver Sent: Thursday, May 28, 2020, 10:34 PM To: user@beam.apache

Re: Issue while submitting python beam pipeline on flink cluster - local

2020-05-29 Thread Ashish Raghav
Hi Michels, Can you help me with a setup required on gcloud? Since i am not able to run FlinkRunner locally , I want to try over cloud and see if it works there. Do you have a doc for that?? Fyi, In another email chain, I am discussing about issues while running PortableRunner. So, the pytho