Hi, All:
For question 1:
The transaction itself is likely to be executed across multiple
tenants. Applying for `TxnID` does not mean that it belongs to a
specific tenant (or namespace). It means that it cannot be completely
isolated. TC in the tenant will store another tenant topic register
consume or send log, it's weird. The transaction is an independent
coordinator, it does not depend on anything,
`TransactionMetadataStore` is a plug-in, which means that users can
implement TransactionMetadataStore and use user-defined Authorization
Provider. Therefore, I think the feature of tenant isolation is not
suitable for transaction coordinator, and it does not need the feature
of tenant isolation. now logic when TC unload, client will retry the
op and don't throw the exception
For question 2:
I think using the new specific endpoint (in the binary protocol) is
better, TC can be implemented by the user self,
`TransactionMetadataStoreProvider` can return the result of the new
command. And implement transaction rights management in
`TransactionMetadataStoreProvider`

Thanks,
bo

Nicolò Boschi <boschi1...@gmail.com> 于2022年12月5日周一 19:41写道:
>
> Hi folks,
>
> I recently opened an issue about transactions [0]. The specific issue is
> the client requires to be able to lookup the system topic
> pulsar/system/transaction_coordinator_assign to get all the transaction
> coordinators to dial with.
>
> Since multi-tenancy is a core feature in Pulsar, this requirement may lead
> to authorization issues in multi-tenant clusters breaking the tenant
> isolation principle.
>
> In this thread I'd like to discuss, more in general, the approach that has
> been taken while designing transactions.
> The main concern I have is that TCs are global.
> A TC is backed by two system topics: transaction_coordinator_assign and
> __transaction_log_x.
> Both are used in the transaction's hot path with different workloads.
> This leads to potential critical issues:
>
> 1. *if somehow one of these topics is unloaded, ALL the tenants using
> transactions will suffer micro-outages*. (haven't looked at the error
> handling but I suppose the error would be thrown in the client's face). In
> general, availability and performance are not granted anymore
> per-tenant/namespace.
>
> I believe the TC should be per-tenant (maybe per-namespace?).
> *Is there any strong reason why this shouldn't be possible by design?* (and
> I mean, regardless of the current implementation and client-server
> compatibility, we can handle them somehow but it's a detail atm)
>
>
> One thing that I believe should be possible at the moment (but I'm not
> sure) are cross-tenant transactions. This wouldn't be possible anymore with
> per-tenant TC-
>
> 2. *the clients need lookup permission to get all the TCs*.
> (transaction_coordinator_assign partitions). This can be solved in
> different ways, even keeping using the TC as a system entity.
>
> At the moment the java client, when starting, needs to get all the
> available TCs to spread transactions over them. The call it does
> is getPartitionedTopicMetadata to the system topic.
> To fix this there are multiple ways:
> a. Suggest to users to extend their own PulsarAuthorizationProvider to
> always allow lookup to that particular topic. (quick, works with all the
> existing clients and it only requires broker/proxy restarts without token
> invalidations) However it's not builtin so this is not optimal. More
> details here: [1]
> b. Add a new auth action LOOKUP in order to allow cluster admins to give
> this permission to their clients without affecting the produce or consume
> ability. This would require only broker restarts plus operational costs for
> the admin.
> c. Creates a new specific endpoint (in the binary protocol) to give all the
> required info to the TC client to properly initialize. This would be the
> preferred solution because the permission would be granular to this
> protocol call and it wouldn't require any permission changes for the
> current applications. However, only new clients (and brokers) may use this
> solution.
>
> I believe the c. option would be great for the mid-term.
> Anyway, if the per-tenant TC is designable, then this issue would be
> resolved as well.
>
>
> [0] https://github.com/apache/pulsar/issues/18716
> [1] https://github.com/apache/pulsar/pull/18718
>
>
> BR,
> Nicolò Boschi

Reply via email to