Re: Querying Dataflow job status via Java SDK

Peter Littig Mon, 12 Oct 2020 11:49:48 -0700

Thanks for the link, Steve - very helpful!

On Mon, Oct 12, 2020 at 11:31 AM Steve Niemitz <sniem...@apache.org> wrote:


> This is what I was referencing:
> https://github.com/googleapis/google-api-java-client-services/tree/master/clients/google-api-services-dataflow/v1b3
>
>
>
>
> On Mon, Oct 12, 2020 at 2:23 PM Peter Littig <plit...@nianticlabs.com>
> wrote:
>
>> Thanks for the replies, Lukasz and Steve!
>>
>> Steve: do you have a link to the google client api wrappers (I'm not sure
>> if I know what they are.)
>>
>> Thank you!
>>
>> On Mon, Oct 12, 2020 at 11:04 AM Steve Niemitz <sniem...@apache.org>
>> wrote:
>>
>>> We use the Dataflow API [1] directly, via the google api client wrappers
>>> (both python and java), pretty extensively.  It works well and doesn't
>>> require a dependency on beam.
>>>
>>> [1] https://cloud.google.com/dataflow/docs/reference/rest
>>>
>>> On Mon, Oct 12, 2020 at 1:56 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> It is your best way to do this right now and this hasn't changed in a
>>>> while (region was added to project and job ids in the past 6 years).
>>>>
>>>> On Mon, Oct 12, 2020 at 10:53 AM Peter Littig <plit...@nianticlabs.com>
>>>> wrote:
>>>>
>>>>> Thanks for the reply, Kyle.
>>>>>
>>>>> The DataflowClient::getJob method uses a Dataflow instance that's
>>>>> provided at construction time (via
>>>>> DataflowPipelineOptions::getDataflowClient). If that Dataflow instance can
>>>>> be obtained from a minimal instance of the options (i.e., containing only
>>>>> the project ID and region) then it looks like everything should work.
>>>>>
>>>>> I suppose a secondary question here is whether or not this approach is
>>>>> the recommended way to solve my problem (but I don't know of any
>>>>> alternatives).
>>>>>
>>>>> On Mon, Oct 12, 2020 at 9:55 AM Kyle Weaver <kcwea...@google.com>
>>>>> wrote:
>>>>>
>>>>>> > I think the answer is to use a DataflowClient in the second
>>>>>> service, but creating one requires DataflowPipelineOptions. Are these
>>>>>> options supposed to be exactly the same as those used by the first 
>>>>>> service?
>>>>>> Or do only some of the fields have to be the same?
>>>>>>
>>>>>> Most options are not necessary for retrieving a job. In general,
>>>>>> Dataflow jobs can always be uniquely identified by the project, region 
>>>>>> and
>>>>>> job ID.
>>>>>> https://github.com/apache/beam/blob/ecedd3e654352f1b51ab2caae0fd4665403bd0eb/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowClient.java#L100
>>>>>>
>>>>>> On Mon, Oct 12, 2020 at 9:31 AM Peter Littig <plit...@nianticlabs.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello, Beam users!
>>>>>>>
>>>>>>> Suppose I want to build two (Java) services, one that launches
>>>>>>> (long-running) dataflow jobs, and the other that monitors the status of
>>>>>>> dataflow jobs. Within a single service, I could simply track a
>>>>>>> PipelineResult for each dataflow run and periodically call getState. How
>>>>>>> can I monitor job status like this from a second, independent service?
>>>>>>>
>>>>>>> I think the answer is to use a DataflowClient in the second service,
>>>>>>> but creating one requires DataflowPipelineOptions. Are these options
>>>>>>> supposed to be exactly the same as those used by the first service? Or 
>>>>>>> do
>>>>>>> only some of the fields have to be the same?
>>>>>>>
>>>>>>> Or maybe there's a better alternative than DataflowClient?
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>

Re: Querying Dataflow job status via Java SDK

Reply via email to