Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Boyuan Zhang
Sorry for the typo. I mean I think we can go with *(3)* and (4): use the data type that is schema-aware as the input of ReadAll. On Wed, Jun 24, 2020 at 7:42 PM Boyuan Zhang wrote: > Thanks for the summary, Cham! > > I think we can go with (2) and (4): use the data type that is schema-aware > as

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Boyuan Zhang
Thanks for the summary, Cham! I think we can go with (2) and (4): use the data type that is schema-aware as the input of ReadAll. Converting Read into ReadAll helps us to stick with SDF-like IO. But only having (3) is not enough to solve the problem of using ReadAll in x-lang case. The key poin

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Chamikara Jayalath
I see. So it seems like there are three options discussed so far when it comes to defining source descriptors for ReadAll type transforms (1) Use Read PTransform as the element type of the input PCollection (2) Use a POJO that describes the source as the data element of the input PCollection (3) P

Re: Apache Beam ZeroMQ connector

2020-06-24 Thread Luke Cwik
I'm not aware of any ZeroMQ connector implementations that are part of Apache Beam. On Wed, Jun 24, 2020 at 11:44 AM Sherif A. Kozman < sherif.koz...@extremesolution.com> wrote: > Hello, > > We were in the process of planning a deployment of exporting stream data > from Aruba Networks Analytics e

Apache Beam ZeroMQ connector

2020-06-24 Thread Sherif A. Kozman
Hello, We were in the process of planning a deployment of exporting stream data from Aruba Networks Analytics engine through Apache beam and it turns out that it utilizes ZeroMQ for messaging. We couldn't find any ZeroMQ connectors and were wondering if it does exist or it would be compatible with

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Luke Cwik
I believe we do require PTransforms to be serializable since anonymous DoFns typically capture the enclosing PTransform. On Wed, Jun 24, 2020 at 10:52 AM Chamikara Jayalath wrote: > Seems like Read in PCollection refers to a transform, at least here: > https://github.com/apache/beam/blob/master/

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Chamikara Jayalath
Seems like Read in PCollection refers to a transform, at least here: https://github.com/apache/beam/blob/master/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseIO.java#L353 I'm in favour of separating construction time transforms from execution time data objects that we store in

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Boyuan Zhang
Hi Ismael, I think the ReadAll in the IO connector refers to the IO with SDF implementation despite the type of input, where Read refers to UnboundedSource. One major pushback of using KafkaIO.Read as source description is that not all configurations of KafkaIO.Read are meaningful to populate dur

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Luke Cwik
To provide additional context, the KafkaIO ReadAll transform takes a PCollection. This KafkaSourceDescriptor is a POJO that contains the configurable parameters for reading from Kafka. This is different from the pattern that Ismael listed because they take PCollection as input and the Read is the s

Re: [DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Eugene Kirpichov
Hi Ismael, Thanks for taking this on. Have you considered an approach similar (or dual) to FileIO.write(), where we in a sense also have to configure a dynamic number different IO transforms of the same type (file writes)? E.g. how in this example we configure many aspects of many file writes: t

Re: Running Beam pipeline using Spark on YARN

2020-06-24 Thread Kamil Wasilewski
Thanks for the information. So it looks like we can't easily run portable pipelines on Dataproc cluster at the moment. > you can set --output_executable_path to create a jar that you can then submit to yarn via spark-submit. I tried to create a jar, but I ran into a problem. I left an error messa

Re: JIRA contributor permissions

2020-06-24 Thread Alexey Romanenko
Hi Brian, Done. Welcome to the project! > On 24 Jun 2020, at 01:52, Brian Michalski wrote: > > Greetings! > > I'm wading my way a few small Go SDK tickets. Can I have contributor > permissions on JIRA? My username is bamnet. > > Thanks, > ~Brian M

[DISCUSS] ReadAll pattern and consistent use in IO connectors

2020-06-24 Thread Ismaël Mejía
Hello, (my excuses for the long email but this requires context) As part of the move from Source based IOs to DoFn based ones. One pattern emerged due to the composable nature of DoFn. The idea is to have a different kind of composable reads where we take a PCollection of different sorts of inter