Thanks for the link, Steve - very helpful! On Mon, Oct 12, 2020 at 11:31 AM Steve Niemitz <sniem...@apache.org> wrote:
> This is what I was referencing: > https://github.com/googleapis/google-api-java-client-services/tree/master/clients/google-api-services-dataflow/v1b3 > > > > > On Mon, Oct 12, 2020 at 2:23 PM Peter Littig <plit...@nianticlabs.com> > wrote: > >> Thanks for the replies, Lukasz and Steve! >> >> Steve: do you have a link to the google client api wrappers (I'm not sure >> if I know what they are.) >> >> Thank you! >> >> On Mon, Oct 12, 2020 at 11:04 AM Steve Niemitz <sniem...@apache.org> >> wrote: >> >>> We use the Dataflow API [1] directly, via the google api client wrappers >>> (both python and java), pretty extensively. It works well and doesn't >>> require a dependency on beam. >>> >>> [1] https://cloud.google.com/dataflow/docs/reference/rest >>> >>> On Mon, Oct 12, 2020 at 1:56 PM Luke Cwik <lc...@google.com> wrote: >>> >>>> It is your best way to do this right now and this hasn't changed in a >>>> while (region was added to project and job ids in the past 6 years). >>>> >>>> On Mon, Oct 12, 2020 at 10:53 AM Peter Littig <plit...@nianticlabs.com> >>>> wrote: >>>> >>>>> Thanks for the reply, Kyle. >>>>> >>>>> The DataflowClient::getJob method uses a Dataflow instance that's >>>>> provided at construction time (via >>>>> DataflowPipelineOptions::getDataflowClient). If that Dataflow instance can >>>>> be obtained from a minimal instance of the options (i.e., containing only >>>>> the project ID and region) then it looks like everything should work. >>>>> >>>>> I suppose a secondary question here is whether or not this approach is >>>>> the recommended way to solve my problem (but I don't know of any >>>>> alternatives). >>>>> >>>>> On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver <kcwea...@google.com> >>>>> wrote: >>>>> >>>>>> > I think the answer is to use a DataflowClient in the second >>>>>> service, but creating one requires DataflowPipelineOptions. Are these >>>>>> options supposed to be exactly the same as those used by the first >>>>>> service? >>>>>> Or do only some of the fields have to be the same? >>>>>> >>>>>> Most options are not necessary for retrieving a job. In general, >>>>>> Dataflow jobs can always be uniquely identified by the project, region >>>>>> and >>>>>> job ID. >>>>>> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100 >>>>>> >>>>>> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig <plit...@nianticlabs.com> >>>>>> wrote: >>>>>> >>>>>>> Hello, Beam users! >>>>>>> >>>>>>> Suppose I want to build two (Java) services, one that launches >>>>>>> (long-running) dataflow jobs, and the other that monitors the status of >>>>>>> dataflow jobs. Within a single service, I could simply track a >>>>>>> PipelineResult for each dataflow run and periodically call getState. How >>>>>>> can I monitor job status like this from a second, independent service? >>>>>>> >>>>>>> I think the answer is to use a DataflowClient in the second service, >>>>>>> but creating one requires DataflowPipelineOptions. Are these options >>>>>>> supposed to be exactly the same as those used by the first service? Or >>>>>>> do >>>>>>> only some of the fields have to be the same? >>>>>>> >>>>>>> Or maybe there's a better alternative than DataflowClient? >>>>>>> >>>>>>> Thanks in advance! >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>