[jira] [Commented] (KUDU-2679) In some scenarios, a Spark Kudu application can be devoid of fresh authn tokens

Alexey Serbin (Jira) Mon, 17 Jun 2024 09:32:11 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-2679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855666#comment-17855666
 ]


Alexey Serbin commented on KUDU-2679:
-------------------------------------

{quote}
The same happens with Flink streaming applications with a Kudu Sink. After the 
--authn_token_validity_seconds  period we have to restart the application.
{quote}

[~ArnaudL],

The crux of this issue with in presence of two actors of different types in 
Spark: the driver and the executors.  Does Flink have similar topology when 
assigning tasks?  If not, then it's not the same issue.

> In some scenarios, a Spark Kudu application can be devoid of fresh authn 
> tokens
> -------------------------------------------------------------------------------
>
>                 Key: KUDU-2679
>                 URL: https://issues.apache.org/jira/browse/KUDU-2679
>             Project: Kudu
>          Issue Type: Bug
>          Components: client, security, spark
>    Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.7.1
>            Reporter: Alexey Serbin
>            Priority: Major
>
> When running in {{cluster}} mode, tasks run as a part of Spark Kudu client 
> application can be devoid of getting new (i.e. non-expired) authentication 
> tokens even if they run for a very short time.  Essentially, if the driver 
> runs longer than the authn token expiration interval and has a particular 
> pattern of making RPC calls to Kudu masters and tablet servers, all tasks 
> scheduled to run after the authn token expiration interval will be supplied 
> with expired authn tokens, making every task fail.  The only way to fix that 
> is restarting the application or dropping long-established connections from 
> the driver to Kudu masters/tservers.
> Below are some details, explaining why that can happen.
> Let's assume the following holds true for a Spark Kudu application:
> * The application is running against a secured Kudu cluster.
> * The application is running in the {{cluster}} mode.
> * There are no primary authentication credentials at the machines for the 
> user under which the Spark executors are running (i.e. {{kinit}} hasn't been 
> run at those executor machines for the corresponding user or the Kerberos 
> credentials has already expired there). 
> * The {{--authn_token_validity_seconds}} masters' flag is set to {{X}} 
> seconds (default is 60 * 60 * 24 * 7 seconds, i.e. 7 days).
> * The {{--rpc_default_keepalive_time_ms}} flag for masters (and tablet 
> servers, if they are involved into the communications between the driver 
> process and the Kudu backend) is set to {{Y}} milliseconds (default is 65000 
> ms).
> * The application is running for longer than {{X}} seconds.
> * The driver process makes requests to Kudu masters at least every {{Y}} 
> milliseconds.
> * The driver either doesn't make requests to Kudu tablet servers or makes 
> such requests at least every {{Y}} milliseconds to each of the involved 
> tablet servers.
> * The executors are running tasks that keep connections to tablet servers 
> idle for longer than {{Y}} milliseconds or the driver spawns tasks at an 
> executor after {{Y}} milliseconds since last task has completed by the 
> executor.
> Essentially, that's about a Spark Kudu application where the driver process 
> keeps once opened connections active and the executors need to open new 
> connections to Kudu tablet servers (and/or masters).  Also, the executor 
> machines doesn't have Kerberos credentials for the OS user under which the 
> executor processes are run.
> In such scenarios, the application's tasks spawned after {{X}} seconds from 
> the application start will fail because of expired authentication tokens, 
> while the driver process will never re-acquire its authn token, keeping the 
> expired token in {{KuduContext}} forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KUDU-2679) In some scenarios, a Spark Kudu application can be devoid of fresh authn tokens

Reply via email to