devinbost opened a new issue #84: URL: https://github.com/apache/pulsar-helm-chart/issues/84
Copying from the Apache/Pulsar Github issue (https://github.com/apache/pulsar/issues/8536): **Describe the bug** After configuring TLS Authentication in Pulsar 2.6.1 with this helm chart: https://github.com/devinbost/pulsar-helm-chart/tree/tls-auth the broker gets stuck in a restart loop due to the `WorkerService` crashing with: > WARN org.apache.pulsar.client.admin.internal.BaseResource - [http://pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local:8080/admin/v2/persistent/public/functions/assignments] Failed to perform http put request: javax.ws.rs.NotAuthorizedException: HTTP 401 Unauthorized during the `WorkerService.start(..)` method execution. With TLS Authentication enabled, the endpoint above should be the TLS endpoint (https://pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local:8443/admin/v2/persistent/public/functions/assignments), not the non-TLS endpoint. This may be the reason why we're getting a 401 on the PUT for function/assignments upon the broker startup. **To Reproduce** Steps to reproduce the behavior: 1. Clone the tls-auth branch of my fork of the Pulsar helm chart by running: ``` git clone https://github.com/devinbost/pulsar-helm-chart.git git checkout tls-auth ``` 2. Start minikube with an appropriate number of CPUs: minikube start --memory=8192 --cpus=6 --cni=bridge 3. Run the following commands to setup the kubernetes environment, tokens, certs, and keys: ``` ./scripts/cert-manager/install-cert-manager.sh ./scripts/pulsar/prepare_helm_release.sh -n pulsar -k pulsar-ci -c --pulsar-superusers superadmin,proxy-admin,broker-admin,client-admin ./scripts/pulsar/upload_tls.sh -k pulsar-ci -d ./.ci/tls ``` 4. Install the local helm chart with the values file specified: `helm install --values examples/values-minikube-with-tls-and-jwt.yaml pulsar-ci ./charts/pulsar/` 5. After waiting for a time, get logs from the broker: `kubectl -n pulsar logs pulsar-ci-broker-0` The logs should demonstrate the problem. **Expected behavior** A clear and concise description of what you expected to happen. **Environment** - minikube v1.14.2 on Darwin 10.15.7 - Kubernetes v1.19.2 on Docker 19.03.8 ... - Enabled addons: storage-provisioner, default-storageclass - kubectl is configured to use "minikube" **Additional Context** Here is the WorkerConfig provided to the WorkerService, as reported in the logs: ``` 01:07:20.757 [main] INFO org.apache.pulsar.functions.worker.WorkerService - Worker Configs: { "workerId" : "c-pulsar-ci-fw-pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local-8080", "workerHostname" : "pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local", "workerPort" : 8080, "workerPortTls" : 6751, "authenticateMetricsEndpoint" : true, "includeStandardPrometheusMetrics" : false, "jvmGCMetricsLoggerClassName" : null, "numHttpServerThreads" : 8, "configurationStoreServers" : "pulsar-ci-zookeeper:2281", "zooKeeperSessionTimeoutMillis" : 30000, "zooKeeperOperationTimeoutSeconds" : 30, "zooKeeperCacheExpirySeconds" : 300, "connectorsDirectory" : "./connectors", "narExtractionDirectory" : "/tmp", "validateConnectorConfig" : false, "functionsDirectory" : "./functions", "functionMetadataTopicName" : "metadata", "functionWebServiceUrl" : null, "pulsarServiceUrl" : "pulsar://pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local:6650", "pulsarWebServiceUrl" : "http://pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local:8080", "clusterCoordinationTopicName" : "coordinate", "pulsarFunctionsNamespace" : "public/functions", "pulsarFunctionsCluster" : "pulsar-ci", "numFunctionPackageReplicas" : 1, "downloadDirectory" : "download/pulsar_functions", "stateStorageServiceUrl" : null, "functionAssignmentTopicName" : "assignments", "schedulerClassName" : "org.apache.pulsar.functions.worker.scheduler.RoundRobinScheduler", "failureCheckFreqMs" : 30000, "rescheduleTimeoutMs" : 60000, "initialBrokerReconnectMaxRetries" : 60, "assignmentWriteMaxRetries" : 60, "instanceLivenessCheckFreqMs" : 30000, "clientAuthenticationPlugin" : "org.apache.pulsar.client.impl.auth.AuthenticationTls", "clientAuthenticationParameters" : "tlsCertFile:/pulsar/certs/broker/tls.crt,tlsKeyFile:/pulsar/certs/broker/tls.key", "bookkeeperClientAuthenticationPlugin" : null, "bookkeeperClientAuthenticationParametersName" : null, "bookkeeperClientAuthenticationParameters" : null, "topicCompactionFrequencySec" : 1800, "tlsEnabled" : true, "tlsCertificateFilePath" : null, "tlsKeyFilePath" : null, "tlsTrustCertsFilePath" : "/pulsar/certs/ca/ca.crt", "tlsAllowInsecureConnection" : false, "tlsRequireTrustedClientCertOnConnect" : false, "useTls" : false, "tlsHostnameVerificationEnable" : false, "tlsCertRefreshCheckDurationSec" : 300, "authenticationEnabled" : true, "authenticationProviders" : [ "org.apache.pulsar.broker.authentication.AuthenticationProviderToken", "org.apache.pulsar.broker.authentication.AuthenticationProviderTls" ], "authorizationEnabled" : true, "authorizationProvider" : "org.apache.pulsar.broker.authorization.PulsarAuthorizationProvider", "superUserRoles" : [ "broker-admin", "client-admin", "proxy-admin" ], "properties" : { }, "brokerClientTrustCertsFilePath" : null, "functionRuntimeFactoryClassName" : "org.apache.pulsar.functions.runtime.kubernetes.KubernetesRuntimeFactory", "functionRuntimeFactoryConfigs" : { "changeConfigMap" : "pulsar-ci-functions-worker-config", "changeConfigMapNamespace" : "pulsar", "expectedMetricsCollectionInterval" : "30", "extraFunctionDependenciesDir" : null, "installUserCodeDependencies" : "true", "javaInstanceJarLocation" : null, "jobNamespace" : "pulsar", "logDirectory" : "/tmp", "pulsarAdminUrl" : "https://pulsar-ci-broker:8443/", "pulsarDockerImageName" : "apachepulsar/pulsar:2.6.1", "pulsarRootDir" : "/pulsar", "pulsarServiceUrl" : "pulsar+ssl://pulsar-ci-broker:6651/", "pythonInstanceLocation" : null, "submittingInsidePod" : "true" }, "secretsProviderConfiguratorClassName" : null, "secretsProviderConfiguratorConfig" : null, "functionInstanceMinResources" : null, "functionAuthProviderClassName" : "org.apache.pulsar.functions.auth.KubernetesSecretsTokenAuthProvider", "runtimeCustomizerClassName" : null, "runtimeCustomizerConfig" : { }, "maxPendingAsyncRequests" : 1000, "threadContainerFactory" : null, "processContainerFactory" : null, "kubernetesContainerFactory" : { "k8Uri" : null, "jobNamespace" : "pulsar", "pulsarDockerImageName" : "apachepulsar/pulsar:2.6.1", "imagePullPolicy" : null, "pulsarRootDir" : "/pulsar", "configAdminCLI" : null, "submittingInsidePod" : true, "pulsarServiceUrl" : "pulsar+ssl://pulsar-ci-broker:6651/", "pulsarAdminUrl" : "https://pulsar-ci-broker:8443/", "installUserCodeDependencies" : true, "pythonDependencyRepository" : null, "pythonExtraDependencyRepository" : null, "extraFunctionDependenciesDir" : null, "customLabels" : null, "expectedMetricsCollectionInterval" : 30, "changeConfigMap" : "pulsar-ci-functions-worker-config", "changeConfigMapNamespace" : "pulsar", "percentMemoryPadding" : 0, "cpuOverCommitRatio" : 1.0, "memoryOverCommitRatio" : 1.0, "grpcPort" : 9093, "metricsPort" : 9094, "narExtractionDirectory" : "/tmp" }, "functionMetadataTopic" : "persistent://public/functions/metadata", "clusterCoordinationTopic" : "persistent://public/functions/coordinate", "functionAssignmentTopic" : "persistent://public/functions/assignments", "tlsTrustChainBytes" : "LS0tLS1C. . . =", "workerWebAddress" : "http://pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local:8080" } ``` **Clues and Possible Solution** The only admin endpoints in the `WorkerConfig` that are NOT TLS are: - "workerWebAddress" : "http://pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local:8080" - "pulsarWebServiceUrl" : "http://pulsar-ci-broker-0.pulsar-ci-broker.pulsar.svc.cluster.local:8080" - "functionWebServiceUrl" : null When we create the `brokerAdmin` client, we use the `pulsarWebServiceUrl`: https://github.com/apache/pulsar/blob/master/pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/WorkerService.java#L146 The first PUT on the function assignment topic uses the `brokerAdmin`client here: https://github.com/apache/pulsar/blob/master/pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/WorkerService.java#L169 Although we check for a few TLS-related configurations (tlsTrustCertsFilePath, allowTlsInsecureConnection, enableTlsHostnameVerificationEnable) when we create the PulsarAdmin client, it doesn't appear that we ever resolve to obtain a TLS endpoint if TLS Authentication is enabled. If a TLS endpoint is required to resolve the 401 response issue, we need to add logic to check if TLS Authentication is enabled; and when TLS Authentication is enabled, we need to use a TLS endpoint when creating the `AdminClient` instances, such as `brokerAdmin`. We could easily add the logic to resolve the correct URL to the brokerConfig class (`ServiceConfiguration`) since this class already knows if `brokerClientTlsEnabled` is true/false: https://github.com/apache/pulsar/blob/master/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/ServiceConfiguration.java#L1656 Then, that value could be assigned to a property on `workerConfig` before injecting `workerConfig` into our `WorkerService`: https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/PulsarBrokerStarter.java#L175 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org