Konstantin,
                            I revert my reservations. My initial
reservation was having 2 services ( one for TMs and one for the native
Queryable Client proxy  ). Having established this setup though  it makes
sense. Having the  native Queryable Client proxy as a side car kind of
deeply couples the query layer with the TMs, inhibiting independent
development of the query layer.

Thanks.



On Fri, Mar 29, 2019 at 9:08 AM Vishal Santoshi <vishal.santo...@gmail.com>
wrote:

> Thanks Konstantin,
>                             That makes sense. To give you some context,
> the reason we are gravitating towards Queryable State is the architectural
> preference of Prometheus to scrape ( pull rather then push model ) and our
> intent to expose aggregations. That said your idea makes sense. The worry I
> had is the ip resolution of TMs that QueryableStateClient does and our
> wanting to avoid static ips . If I understand correctly you are proposing a
> proxy "external" to the Job deployment, as in an external service that
> discovers the job and  works off the ingress End Point that exposes the
> Queryable Port of the TMs ?
>
> That creates a fragmented architecture that I wanted to avoid, iff I
> understood your advise correctly.
>
> Vishal
>
>
>
>
>
>
> On Fri, Mar 29, 2019 at 5:42 AM Konstantin Knauf <konstan...@ververica.com>
> wrote:
>
>> Hi Vishal,
>>
>> my approach would be a single Kubernetes service, which is backed by all
>> Taskmanagers of the job. The Taskmanagers will proxy the request for a
>> specific key to the correct Taskmanager. Yes, the Taskmanagers will cache
>> the location of the key groups.
>>
>> In addition to this Kubernetes service, you can of course have a
>> Jetty/Jersey REST based server that sends queries to this service.
>>
>> Please le me know if this works for you.
>>
>> Hope this helps and cheers,
>>
>> Konstantin
>>
>>
>> On Thu, Mar 28, 2019 at 12:37 AM Vishal Santoshi <
>> vishal.santo...@gmail.com> wrote:
>>
>>> I think I got a handle on this. For those who might want to do this
>>>
>>>
>>> Here are the steps ( I could share the  Jetty/Jersey REST code too is
>>> required )
>>>
>>>
>>> *1.* Create a side car container on each pod that has a TM. I wrote a
>>> simple Jetty/Jersey REST based server that  execute queries against the
>>> local TM query server.
>>>
>>>   .
>>>
>>>   .
>>>
>>>   - name: queryable-state
>>>
>>>         image: _IMAGE_
>>>
>>>         args: ["queryable-state"]
>>>
>>>         env:
>>>
>>>           - name: POD_IP
>>>
>>>             valueFrom:
>>>
>>>               fieldRef:
>>>
>>>                 fieldPath: status.podIP
>>>
>>>         ports:
>>>
>>>           - containerPort: 9999
>>>
>>>             name: qstate-client
>>>
>>>         resources:
>>>
>>>           requests:
>>>
>>>             cpu: "0.25"
>>>
>>>             memory: "256Mi"
>>>
>>>
>>>    Note that POD_IP is the ip used by the REST based server to start
>>> the QueryableStateClient and the port is the default port of the TM query
>>> server ( 9069 I think ) of the colocated TM container.
>>>
>>>
>>>
>>> *2.* Expose the port ( in this case 9999 ) at the k8s service layer.
>>>
>>>
>>>
>>> And that did it.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> I though am worried about a couple of things
>>>
>>>
>>> *1*.
>>>
>>>  The  TM query server will ask JM for the key group and hence the TM a
>>> key belongs to for every request and then coordinate the coummunication
>>> between the client and that TM. Does flink do any optimzation, as in cache
>>> the key ranges and thus the affinity to a TM to reduce JM stress. I would
>>> imagine that being some well known distribution function on some well known
>>> hash algorithm, an incident key could be pinned to a TM without visiting
>>> the JM more then once.
>>>
>>>
>>> *2. *
>>>
>>> We do have use cases where we would want to iterate over all the keys in
>>> a key group ( and by extension on a TM ) for a job. Is that a possibility ?
>>>
>>>
>>>
>>> *3. *
>>>
>>> The overhead of having as many client containers as TMs.
>>>
>>>
>>>
>>> Any advise/ideas on the 3 worry points ?
>>>
>>>
>>>
>>> Regards
>>>
>>> On Mon, Mar 25, 2019 at 8:57 PM Vishal Santoshi <
>>> vishal.santo...@gmail.com> wrote:
>>>
>>>> I  have 2 options
>>>>
>>>> 1. A Rest Based,  in my case a Jetty/REST based QueryableStateClient
>>>> in a side car container colocated on JM  ( Though it could on all TMs but
>>>> that looks to an overkill )
>>>>
>>>> 2.A Rest Based,  in my case a Jetty/REST based QueryableStateClient  in
>>>> a side car container colocated on TMs.  The Query Proxies are on the TMs,
>>>> so in essence the communication would be within containers of the POD and I
>>>> could load balance ( have ot test  )
>>>>
>>>> The second alternative seems doable, but looks an overkill  but am not
>>>> sure how to establish a TM on the standalone QueryableStateClient, given
>>>> that TM's pod IP is not known till the pod is launched.
>>>>
>>>> Has anyone had a successful QueryableState setup for flink  on k8s?
>>>>
>>>> Regards,
>>>>
>>>
>>
>> --
>>
>> Konstantin Knauf | Solutions Architect
>>
>> +49 160 91394525
>>
>> <https://www.ververica.com/>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>> --
>>
>> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>> Data Artisans GmbH
>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>>
>

Reply via email to