Thanks for starting the discussion.

One transaction is allowed to work with multiple topics from multiple
tenants/namespaces.
And there are some real cases of injecting data from multiple tenants and
publishing
the calculated result to one or many topics under exactly-once semantics.

But I agree we should find a solution to replace the centralized
transaction coordinator.
One rough idea is to have a tenant/namespace for a transaction. It looks
like

Txn txn = new Txn("public/default")

which means the client will use the TC served by `public/default` namespace.
Then we can have permission to access the TC of the namespace.
The transaction can still work with multiple topics from multiple
namespaces/tenants.
Only the TC service or transaction log is isolated.

For the short-term solution. I think we can skip the lookup permission
check for the
transaction_assignment topic. We will not write any data on this topic.
Just using the
topic lookup to find where the TC is, get the service URL, and send the
transaction commands
to TC directly. But can we skip it in the broker? not the Authorization
Provider?
Otherwise, the user-defined Authorization Provider is also required to
apply the changes.

For option c, the broker and client must upgrade to the new version.
And in essence, they are(new API vs. lookup) almost the same, just
different APIs.

Thanks,
Penghui

On Mon, Dec 5, 2022 at 7:41 PM Nicolò Boschi <boschi1...@gmail.com> wrote:

> Hi folks,
>
> I recently opened an issue about transactions [0]. The specific issue is
> the client requires to be able to lookup the system topic
> pulsar/system/transaction_coordinator_assign to get all the transaction
> coordinators to dial with.
>
> Since multi-tenancy is a core feature in Pulsar, this requirement may lead
> to authorization issues in multi-tenant clusters breaking the tenant
> isolation principle.
>
> In this thread I'd like to discuss, more in general, the approach that has
> been taken while designing transactions.
> The main concern I have is that TCs are global.
> A TC is backed by two system topics: transaction_coordinator_assign and
> __transaction_log_x.
> Both are used in the transaction's hot path with different workloads.
> This leads to potential critical issues:
>
> 1. *if somehow one of these topics is unloaded, ALL the tenants using
> transactions will suffer micro-outages*. (haven't looked at the error
> handling but I suppose the error would be thrown in the client's face). In
> general, availability and performance are not granted anymore
> per-tenant/namespace.
>
> I believe the TC should be per-tenant (maybe per-namespace?).
> *Is there any strong reason why this shouldn't be possible by design?* (and
> I mean, regardless of the current implementation and client-server
> compatibility, we can handle them somehow but it's a detail atm)
>
>
> One thing that I believe should be possible at the moment (but I'm not
> sure) are cross-tenant transactions. This wouldn't be possible anymore with
> per-tenant TC-
>
> 2. *the clients need lookup permission to get all the TCs*.
> (transaction_coordinator_assign partitions). This can be solved in
> different ways, even keeping using the TC as a system entity.
>
> At the moment the java client, when starting, needs to get all the
> available TCs to spread transactions over them. The call it does
> is getPartitionedTopicMetadata to the system topic.
> To fix this there are multiple ways:
> a. Suggest to users to extend their own PulsarAuthorizationProvider to
> always allow lookup to that particular topic. (quick, works with all the
> existing clients and it only requires broker/proxy restarts without token
> invalidations) However it's not builtin so this is not optimal. More
> details here: [1]
> b. Add a new auth action LOOKUP in order to allow cluster admins to give
> this permission to their clients without affecting the produce or consume
> ability. This would require only broker restarts plus operational costs for
> the admin.
> c. Creates a new specific endpoint (in the binary protocol) to give all the
> required info to the TC client to properly initialize. This would be the
> preferred solution because the permission would be granular to this
> protocol call and it wouldn't require any permission changes for the
> current applications. However, only new clients (and brokers) may use this
> solution.
>
> I believe the c. option would be great for the mid-term.
> Anyway, if the per-tenant TC is designable, then this issue would be
> resolved as well.
>
>
> [0] https://github.com/apache/pulsar/issues/18716
> [1] https://github.com/apache/pulsar/pull/18718
>
>
> BR,
> Nicolò Boschi
>

Reply via email to