BTW. I re-read your comment in the AIP and yeah ... I think I completely
misunderstood it :)

On Wed, Feb 16, 2022 at 6:08 PM Jarek Potiuk <[email protected]> wrote:

> Just a reminder - meeting in ~ 50 minutes :)
>
> On Wed, Feb 16, 2022 at 2:34 PM Jarek Potiuk <[email protected]> wrote:
>
>> Happy to hear if others have some experiences with in-process (and what I
>> really want is to make some benchmarking to see how much overhead each
>> option involves. I'd say that the "coarseness" of the calls (with maybe
>> exception of Connection/variable retrieval etc. will make the
>> serialization/deserialization will have very little impact on performance
>> (but without actually checking it it's hard to say for sure). Another
>> option is if inter-process communication will turn into a problem (and I
>> saw people doing it in C++) - people did "rip" some parts of thrift to only
>> leave a "serialization/deserialization". But in our case - if we find that
>> either the need to have separate process or communication involves a lot of
>> overhead we could come back to the idea of delegating the calls via
>> decorators.
>>
>> On Wed, Feb 16, 2022 at 2:22 PM Jarek Potiuk <[email protected]> wrote:
>>
>>> I looked at that too - and let me leave that as an option to explore in
>>> the first step. I will make a note.
>>>
>>> From what I checked - none of the current "ready-to-use" gRPC solutions
>>> have such an "in-process" option. I believe the "RPC framework re-use" for
>>> serialization/deserialization/transport might save a LOT of headache.
>>>
>>> However - Apache Thrift supports "shared-memory" transport. I still
>>> think it requires a separate process (To be confirmed).
>>> The gRPC  one supports local TCP and Unix Sockets only. The in-memory
>>> option is not there (though people asked for it
>>> https://github.com/grpc/grpc/issues/19959)
>>>
>>> J.
>>>
>>>
>>> On Wed, Feb 16, 2022 at 2:13 PM Ash Berlin-Taylor <[email protected]>
>>> wrote:
>>>
>>>> That wasn't actually quite what I had in mind :)
>>>>
>>>> I was thinking that we _wouldn't_ go cross process at all, but in the
>>>> "local"/direct mode we will as-directly-as-possible call the handler code.
>>>> So for local/no-isolation we would still use the handler for the RPC, but
>>>> there it's just not "remote".
>>>>
>>>> -ash
>>>>
>>>> On Wed, Feb 16 2022 at 13:01:11 +0100, Jarek Potiuk <[email protected]>
>>>> wrote:
>>>>
>>>> Hey Everyone,
>>>>
>>>> Based on the feedback, I updated DAG-44
>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>> - the "implementation notes" with improved approach.
>>>>
>>>> Ash had a good suggestion (which I really like) that instead of
>>>> inventing our own decorators and different way of handling the internal and
>>>> external communication for the "coarse" functions that require the
>>>> database, we could approach it  differently - namely we could always use
>>>> RPC - no matter if we are in DB isolation mode or "no isolation" mode. Of
>>>> course in case of the "no isolation" mode, the communication should have
>>>> very low overhead (local TCP or Sockets, no authorization). I looked at
>>>> existing RPC implementations we could use for that and I narrowed down
>>>> potential choice of technologies to gRPC and Apache Thrift for that.
>>>>
>>>> This approach has multiple advantages:
>>>>
>>>> * we can leverage existing RPC implementations (Thrift and gRPC are
>>>> both mature and have integration with HTTPS, various authentication options
>>>> and can be also run using local sockets)
>>>> * the code will be much simpler to maintain - we will use existing
>>>> serialization mechanisms from those protocols
>>>> * no custom code for communication needed - both Thrift and gRPC have
>>>> all that is needed for scalable, robust communication
>>>>
>>>> I think this way we will be able to implement a more robust and
>>>> maintainable solution much faster.
>>>>
>>>> I also reached out to Apache Beam (they have support for both gRPC and
>>>> Thrift and are in the process of transitioning - from Thrift to gRPC as
>>>> primary protocol and I am sure they have done a lot of analysis that can
>>>> help us to make the final decision.
>>>>
>>>> This approach changes only the implementation details of the AIP-44 -
>>>> all the rest is the same, the approach, deployment options remain untouched
>>>> by this change.
>>>>
>>>> If you have any comments to that - feel free/ I will also discuss it
>>>> today at the meeting and if there will be general consensus that the
>>>> direction is right I would love to start voting on AIP-44 ideally tomorrow
>>>> - so that next week we can start implementing it. I am not sure if we want
>>>> to make a final decision about gRPC/Thrift (maybe there are people who have
>>>> good experience both and can share it here?).
>>>>
>>>> I think more detailed POC and benchmarking might be the first step of
>>>> the AiP - where we make the final choice based on an attempt to implement
>>>> POC for both - but I am also happy to listen to those who have more
>>>> experience with both (and maybe Beam experience will help with that)..
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <[email protected]> wrote:
>>>>
>>>>> The meeting is tomorrow :)/ Feel free to join I will also record it
>>>>> and publish minutes!
>>>>>
>>>>> On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi <
>>>>> [email protected]> wrote:
>>>>> >
>>>>> > Hello Everyone,
>>>>> > is there any follow up of this meeting? I would like to participate
>>>>> if it's possible.
>>>>> > Best Regards,
>>>>> > Giorgio
>>>>> >
>>>>> > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk <
>>>>> [email protected]> ha scritto:
>>>>> >>
>>>>> >> Hello Everyone,
>>>>> >>
>>>>> >>  I think it's about the time for the next sig-multitenancy meeting :
>>>>> >>
>>>>> >> I created a doodle poll for next week - please mark your
>>>>> availability till Friday the 4th.
>>>>> >>
>>>>> >>
>>>>> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link
>>>>> >>
>>>>> >> I think what the rough agenda will be:
>>>>> >>
>>>>> >> * AIP-43 Dag Processor Separation [1] - implementation progress -
>>>>> Mateusz
>>>>> >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) -
>>>>> Jarek
>>>>> >> * AIP-45 Remove double DAG parsing [3] -  discussion - Ping
>>>>> >> * AIP-46 Docker runtime isolation [4] - discussion - Ping
>>>>> >> * Also there are some ideas (not yet in AIP form) around optimizing
>>>>> DagProcessorLoop that might be good to talk about - also Ping.
>>>>> >>
>>>>> >> If there are any more proposals - feel free to ping me.
>>>>> >> I also encourage everyone to comment the AIP-45/46 proposals from
>>>>> Ping before the meeting.
>>>>> >>
>>>>> >> [1]
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation
>>>>> >> [2]
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>> >> [3]
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run
>>>>> >> [4]
>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing
>>>>> >>
>>>>> >> J.
>>>>> >>
>>>>> >>
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Life is a chess game - Anonymous.
>>>>>
>>>>

Reply via email to