Takashi Sakai created KAFKA-15413:
-------------------------------------

             Summary: kafka-server-stop fails with COLUMNS environment variable 
on Ubuntu
                 Key: KAFKA-15413
                 URL: https://issues.apache.org/jira/browse/KAFKA-15413
             Project: Kafka
          Issue Type: Bug
          Components: tools
         Environment: kafka: 3.5.1
Java: openjdk version "20.0.1" 2023-04-18
OS: Ubuntu 22.04.3 LTS on WSL2/Windows 11
            Reporter: Takashi Sakai


{{kafka-server-stop}} script does not work if environment variable {{COLUMNS}} 
is set on Ubuntu.

{*}Steps to reproduce{*}:
kafka/zookeeper.properties
{noformat}
dataDir=/tmp/kafka-test-20230828-15217-1lop1tk/zookeeper
clientPort=34461
maxClientCnxns=0
admin.enableServer=false
{noformat}
kafka/server.properties
{noformat}
broker.id=0
listeners=PLAINTEXT://:46161
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-test-20230828-15217-1lop1tk/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:34461
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0
{noformat}
{noformat}
$ zookeeper-server-start kafka/zookeeper.properties >/dev/null 2>&1 &
[1] 18593
$ kafka-server-start kafka/server.properties >/dev/null 2>&1 &
[2] 18982
$ COLUMNS=10 kafka-server-stop # This is unexpected
No kafka server to stop
$ kafka-server-stop
$ zookeeper-server-stop
[2]+  Exit 143                kafka-server-start kafka/server.properties
$ 
[1]+  Exit 143                zookeeper-server-start kafka/zookeeper.properties 
{noformat}
In the third command, I specified {{COLUMNS}} environment variable. It caused 
{{kafka-server-stop}} script to fail finding kafka process.

*Cause*

{{kafka-server-stop}} script uses {{ps ax}} to find kafka process.
{noformat}
OSNAME=$(uname -s)
if [[ "$OSNAME" == "OS/390" ]]; then
    (snip)
elif [[ "$OSNAME" == "OS400" ]]; then
    (snip)
else
    PIDS=$(ps ax | grep ' kafka\.Kafka ' | grep java | grep -v grep | awk 
'{print $1}')
fi
{noformat}
On Ubuntu, {{ps ax}} truncates its output if environment variable {{COLUMNS}} 
exists.

([source code of ps command|#L226-L230]] shows that COLUMNS environment 
variable wins result of {{{}isatty{}}})
{noformat}
$ ps ax | cat
  19912 pts/0    Sl     0:03 
/home/linuxbrew/.linuxbrew/opt/openjdk/libexec/bin/java -Xmx1G -Xms1G -server 
-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 
-XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true 
-Xlog:gc*:file=/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../logs/kafkaServer-gc.log:time,tags:filecount=10,filesize=100M
 -Dcom.sun.management.jmxremote 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false 
-Dkafka.logs.dir=/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../logs
 
-Dlog4j.configuration=file:/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../config/log4j.properties
 -cp 
/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../libs/activation-1.1.1.jar:(snip):/home/linuxbrew/.linuxbrew/Cellar/kafka/3.5.1/libexec/bin/../libs/zstd-jni-1.5.5-1.jar
 kafka.Kafka kafka/server.properties
$ COLUMNS=10 ps ax | cat
  19912 pts/0    Sl     0:05 /home/linux
{noformat}
I tested this on WSL2 on Windows and openjdk installed with Homebrew, but it 
should occur on any environment with {{{}procps-ng{}}}.

*Problem*

This caused CI failure in Homebrew project. 
([GitHub/Homebrew/homebrew-core#133887|https://gitlab.com/procps-ng/procps/-/blob/675246119df143a5f8ced6e3313edac6ccc3e222/src/ps/global.c#L226-L230])

Homebrew's behavior that passes {{COLUMNS}} environment variable seems a bug. 
But, {{server-stop}} script is not expected to be affected by such an 
environment variable. So, this also seemed to be a bug for me.

*Related issues*

This problem, KAFKA-4931 and KAFKA-4110 can also be fixed by introducing 
ProcessID file. But the three problem have different cause and can be thought 
separately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to