reviving KIP-723: Add socket.tcp.no.delay property to Kafka Config

2025-01-29 Thread Francois
n `seq 3` ; do sudo podman exec --workdir /opt/kafka/bin/
broker ./kafka-topics.sh --bootstrap-server localhost:9092 --create
--topic test-topic-$i ; done

# producer1:
sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while
sleep 1; do echo prod1 ; done | ./kafka-console-producer.sh
--bootstrap-server localhost:9092 --topic test-topic-1"
# producer2:
sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while
sleep 1; do echo prod2 ; done | ./kafka-console-producer.sh
--bootstrap-server localhost:9092 --topic test-topic-2"
# producer3:
sudo podman exec -ti --workdir /opt/kafka/bin/ broker bash -c "while
sleep 1; do echo prod3 ; done | ./kafka-console-producer.sh
--bootstrap-server localhost:9092 --topic test-topic-3"


# consumer, I'm running that as my user
podman run --network host --rm -ti  --workdir /opt/kafka/bin/ --name
client --entrypoint bash apache/kafka:latest
./kafka-console-consumer.sh --bootstrap-server 10.90.0.10:9092
--include test-topic-. --consumer-property fetch.max.wait.ms=1
--consumer-property heartbeat.interval.ms=2 --consumer-property
session.timeout.ms=6 --consumer-property fetch.min.bytes=5



Cheers!
Francois


Re: RemoteStorageManager.fetchLogSegment - how to deal with InterruptedException?

2023-11-17 Thread Francois Visconte
Hi,

Is there a possibility to keep fetching segments instead of cancelling the
underlying request?
The reason I'm asking is that with the current implementation (I'm using
aiven s3 plugin) there is a possibility that
consumer never make any progress if the fetch.max.wait.ms is set too low
and they keep issuing fetch requests that
are never succeeding.
At least having observability on this with a metric would make debugging
easier.

F.

On Fri, Nov 17, 2023 at 2:17 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Ivan,
>
> I've opened a relevant patch to increase the timeout to fetch data from
> remote storage. This will reduce the error occurrence rate:
>
> https://github.com/apache/kafka/pull/14778
>
> I think it's better to distinguish the cases:
>
> 1) Any error that happens while reading the remote data -- this should be
> logged at ERROR level and the failedRemoteReadRequestRate metric should
> reflect it.
> 2) When the task is cancelled due to timeout, then we can log it in the
> debug level and skip updating the metrics.
>
> On Fri, Nov 17, 2023 at 6:11 PM Ivan Yurchenko  wrote:
>
> > Hello!
> >
> > `RemoteStorageManager.fetchLogSegment` is called in a background thread
> by
> > the broker [1]. When a fetch request times out, the associated Future is
> > cancelled [2] and the thread is interrupted. If the InterruptedException
> is
> > propagated from the `RemoteStorageManager`, it pollutes the broker logs
> > with not very useful exceptions and also metrics with false errors [3].
> It
> > would be good to deal with this somehow. I'm see two options:
> >
> > 1. Add a `catch` to handle `InterruptedException` here [4] in a less
> noisy
> > way. Maybe log on the debug level + not increase the error metrics.
> > 2. Catch `InterruptedException` in the `RemoteStorageManager`
> > implementation and return an empty `InputStream`. It seems safe because
> it
> > goes this way [5]. In this case, this needs to be documented in the
> Javadoc.
> >
> > Personally, the first seems a better way. What does the community think
> > about this? Regardless of the decision, I volunteer to make a PR (and a
> KIP
> > if needed).
> > Thank you!
> >
> > Best,
> > Ivan
> >
> > [1]
> >
> https://github.com/apache/kafka/blob/61fb83e20280ed3ac83de8290dd9815bdb7efcea/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1457-L1459
> > [2]
> >
> https://github.com/apache/kafka/blob/6f197301646135e0bb39a461ca0a07c09c3185fb/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L79-L83
> > [3]
> >
> https://github.com/apache/kafka/blob/8aaf7daff393f2b26438fb7fe28016e06e23558c/core/src/main/java/kafka/log/remote/RemoteLogReader.java#L68-L72
> > [4]
> >
> https://github.com/apache/kafka/blob/8aaf7daff393f2b26438fb7fe28016e06e23558c/core/src/main/java/kafka/log/remote/RemoteLogReader.java#L68
> > [5]
> >
> https://github.com/apache/kafka/blob/61fb83e20280ed3ac83de8290dd9815bdb7efcea/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1276-L1278
> >
> >
>


Re: [PROPOSAL] Add commercial support page on website

2024-01-11 Thread Francois Papon

Hi Justine,

You're right, Kafka is a part of my business (training, consulting, 
architecture design, sla...) and most of the time, users/customers said 
that it was hard for them to find a commercial support (in France for my 
case) after searching on the Kafka website (Google didn't help them).


As an ASF member and PMC of several ASF projects, I know that this kind 
of page exist so this is why I made this proposal for the Kafka project 
because I really think that it can help users.


As you suggest, I can submit a PR to be added on the "powered by" page.

Thanks,

François

On 11/01/2024 21:00, Justine Olshan wrote:

Hey François,

My point was that the companies on that page use kafka as part of their
business. If you use Kafka as part of your business feel free to submit a
PR to be added.

I second Chris's point that other projects are not enough to require Kafka
having such a support page.

Justine

On Thu, Jan 11, 2024 at 11:57 AM Chris Egerton 
wrote:


Hi François,

Is it an official policy of the ASF that projects provide a listing of
commercial support options for themselves? I understand that other projects
have chosen to provide one, but this doesn't necessarily imply that all
projects should do the same, and I can't say I find this point very
convincing as a rebuttal to some of the good-faith concerns raised by the
PMC and members of the community so far. However, if there's an official
ASF stance on this topic, then I acknowledge that Apache Kafka should align
with it.

Best,

Chris


On Thu, Jan 11, 2024, 14:50 fpapon  wrote:


Hi Justine,

I'm not sure to see the difference between "happy users" and vendors
that advertise their products in some of the company list in the
"powered by" page.

Btw, my initial purpose of my proposal was to help user to find support
for production stuff rather than searching in google.

I don't think this is a bad thing because this is something that already
exist in many ASF projects like:

https://hop.apache.org/community/commercial/
https://struts.apache.org/commercial-support.html
https://directory.apache.org/commercial-support.html
https://tomee.apache.org/commercial-support.html
https://plc4x.apache.org/users/commercial-support.html
https://camel.apache.org/community/support/
https://openmeetings.apache.org/commercial-support.html
https://guacamole.apache.org/support/



https://cwiki.apache.org/confluence/display/HADOOP2/Distributions+and+Commercial+Support
https://activemq.apache.org/supporthttps://karaf.apache.org/community.html

https://netbeans.apache.org/front/main/help/commercial-support/
https://royale.apache.org/royale-commercial-support/

https://karaf.apache.org/community.html

As I understand for now, the channel for users to find production
support is:

- The mailing list (u...@kafka.apache.org / dev@kafka.apache.org)

- The official #kafka  ASF Slack channel (may be we can add it on the
website because I didn't find it in the website =>
https://kafka.apache.org/contact)

- Search in google for commercial support only

I can update my PR to mention only the 3 points above for the "get
support" page if people think that having a support page make sense.

regards,

François

On 11/01/2024 19:34, Justine Olshan wrote:

I think there is a difference between the "Powered by" page and a page

for

vendors to advertise their products and services.

The idea is that the companies on that page are "powered by" Kafka.

They

serve as examples of happy users of Kafka.
I don't think it is meant only as a place just for those companies to
advertise.

I'm a little confused by


In this case, I'm ok to say that the commercial support section in the

"Get support" is no need as we can use this page.

If you plan to submit for this page, please include a description on

how

your company uses Kafka.

I'm happy to hear other folks' opinions on this page as well.

Thanks,
Justine



On Thu, Jan 11, 2024 at 8:57 AM fpapon  wrote:


Hi,

About the vendors list and neutrality, what is the policy of the
"Powered by" page?

https://kafka.apache.org/powered-by

We can see company with logo, some are talking about their product
(Agoora), some are offering services (Instaclustr, Aiven), and we can
also see some that just put their logo and a link to their website
without any explanation (GoldmanSachs).

So as I understand and after reading the text in the footer of this
page, every company can add themselves by providing a PR right?

"Want to appear on this page?
Submit a pull request or send a quick description of your organization
and usage to the mailing list and we'll add you."

In this case, I'm ok to say that the commercial support section in the
"Get support" is no need as we can use this page.

regards,

François


On 10/01/2024 19:03, Kenneth Eversole wrote:

I agree with Divji here and to be more pointed. I worry that if we go

down

the path of adding vendors to a list it comes off as supporting their
product, not to mention could be a huge security ris

Re: [PROPOSAL] Add commercial support page on website

2024-01-15 Thread Francois Papon

Hi Matthias,

Thank you for your feedback, it make sense. My proposal was to help user 
finding support but as you said, may be this is not the right way to 
address it.


regards,

François

On 13/01/2024 00:34, Matthias J. Sax wrote:

François,

thanks for starting this initiative. Personally, I don't think it's 
necessarily harmful for the project to add such a new page, however, I 
share the same concerns others raised already.


I understand your motivation that people had issues finding commercial 
support, but I am not sure we can address this issue that way. I am 
also "worried" (for the lack of a better word) that the page might 
become long an unwieldy. In the end, any freelancer/consultant 
offering Kafka services would be able to get on the page, so we might 
get hundreds of entries, what also makes it impossible for users to 
find what they are looking for. Also, the services of different 
companies might vary drastically; should users read all these 
descriptions? I can also imagine that some companies offer their 
services only in some countries/regions making it even harder for user 
to find what they are looking for?


Overall, it sounds more like a search optimization problem, and thus 
it seems out-of-scope what we can solve. As I said, I am not strictly 
against it, but I just don't see much value either.



-Matthias

On 1/11/24 12:55 PM, Francois Papon wrote:

Hi Justine,

You're right, Kafka is a part of my business (training, consulting, 
architecture design, sla...) and most of the time, users/customers 
said that it was hard for them to find a commercial support (in 
France for my case) after searching on the Kafka website (Google 
didn't help them).


As an ASF member and PMC of several ASF projects, I know that this 
kind of page exist so this is why I made this proposal for the Kafka 
project because I really think that it can help users.


As you suggest, I can submit a PR to be added on the "powered by" page.

Thanks,

François

On 11/01/2024 21:00, Justine Olshan wrote:

Hey François,

My point was that the companies on that page use kafka as part of their
business. If you use Kafka as part of your business feel free to 
submit a

PR to be added.

I second Chris's point that other projects are not enough to require 
Kafka

having such a support page.

Justine

On Thu, Jan 11, 2024 at 11:57 AM Chris Egerton 


wrote:


Hi François,

Is it an official policy of the ASF that projects provide a listing of
commercial support options for themselves? I understand that other 
projects
have chosen to provide one, but this doesn't necessarily imply that 
all

projects should do the same, and I can't say I find this point very
convincing as a rebuttal to some of the good-faith concerns raised 
by the
PMC and members of the community so far. However, if there's an 
official
ASF stance on this topic, then I acknowledge that Apache Kafka 
should align

with it.

Best,

Chris


On Thu, Jan 11, 2024, 14:50 fpapon  wrote:


Hi Justine,

I'm not sure to see the difference between "happy users" and vendors
that advertise their products in some of the company list in the
"powered by" page.

Btw, my initial purpose of my proposal was to help user to find 
support

for production stuff rather than searching in google.

I don't think this is a bad thing because this is something that 
already

exist in many ASF projects like:

https://hop.apache.org/community/commercial/
https://struts.apache.org/commercial-support.html
https://directory.apache.org/commercial-support.html
https://tomee.apache.org/commercial-support.html
https://plc4x.apache.org/users/commercial-support.html
https://camel.apache.org/community/support/
https://openmeetings.apache.org/commercial-support.html
https://guacamole.apache.org/support/


https://cwiki.apache.org/confluence/display/HADOOP2/Distributions+and+Commercial+Support 

https://activemq.apache.org/supporthttps://karaf.apache.org/community.html 


https://netbeans.apache.org/front/main/help/commercial-support/
https://royale.apache.org/royale-commercial-support/

https://karaf.apache.org/community.html

As I understand for now, the channel for users to find production
support is:

- The mailing list (u...@kafka.apache.org / dev@kafka.apache.org)

- The official #kafka  ASF Slack channel (may be we can add it on the
website because I didn't find it in the website =>
https://kafka.apache.org/contact)

- Search in google for commercial support only

I can update my PR to mention only the 3 points above for the "get
support" page if people think that having a support page make sense.

regards,

François

On 11/01/2024 19:34, Justine Olshan wrote:
I think there is a difference between the "Powered by" page and a 
page

for

vendors to advertise their products and services.

The idea is that the companies on that page are "powered by" Kafka

Re: [PROPOSAL] Add commercial support page on website

2024-01-15 Thread Francois Papon

Hi Tison,

Publishing a dedicated website for that can be a good idea, however if 
the link of the website could not be mention in the official Apache 
Kafka website I'm afraid that it will not be relevant.


BTW, as I understand after all the feedback of the Apache Kafka PMC and 
community, my proposal is not a good idea for the project so I will 
close the PR.


Thanks all for the feedback.

regards,

François

On 14/01/2024 12:56, tison wrote:

FWIW - even if it's rejected by the Kafka PMC, you can maintain your
own page for such information and provide your personal comments on
them. If the object is to provide information and help users to make
decisions, it should help. Although you should do the SEO by yourself,
if the information is somehow neutral and valuable, you can ask the
@apachekafka Twitter (X) account to propagate it and provide a blog
for Kafka blogs.

This is the common way how third-party "evangelist" producing content
and get it promoted.

Best,
tison.

Matthias J. Sax  于2024年1月13日周六 07:35写道:

François,

thanks for starting this initiative. Personally, I don't think it's
necessarily harmful for the project to add such a new page, however, I
share the same concerns others raised already.

I understand your motivation that people had issues finding commercial
support, but I am not sure we can address this issue that way. I am also
"worried" (for the lack of a better word) that the page might become
long an unwieldy. In the end, any freelancer/consultant offering Kafka
services would be able to get on the page, so we might get hundreds of
entries, what also makes it impossible for users to find what they are
looking for. Also, the services of different companies might vary
drastically; should users read all these descriptions? I can also
imagine that some companies offer their services only in some
countries/regions making it even harder for user to find what they are
looking for?

Overall, it sounds more like a search optimization problem, and thus it
seems out-of-scope what we can solve. As I said, I am not strictly
against it, but I just don't see much value either.


-Matthias

On 1/11/24 12:55 PM, Francois Papon wrote:

Hi Justine,

You're right, Kafka is a part of my business (training, consulting,
architecture design, sla...) and most of the time, users/customers said
that it was hard for them to find a commercial support (in France for my
case) after searching on the Kafka website (Google didn't help them).

As an ASF member and PMC of several ASF projects, I know that this kind
of page exist so this is why I made this proposal for the Kafka project
because I really think that it can help users.

As you suggest, I can submit a PR to be added on the "powered by" page.

Thanks,

François

On 11/01/2024 21:00, Justine Olshan wrote:

Hey François,

My point was that the companies on that page use kafka as part of their
business. If you use Kafka as part of your business feel free to submit a
PR to be added.

I second Chris's point that other projects are not enough to require
Kafka
having such a support page.

Justine

On Thu, Jan 11, 2024 at 11:57 AM Chris Egerton 
wrote:


Hi François,

Is it an official policy of the ASF that projects provide a listing of
commercial support options for themselves? I understand that other
projects
have chosen to provide one, but this doesn't necessarily imply that all
projects should do the same, and I can't say I find this point very
convincing as a rebuttal to some of the good-faith concerns raised by
the
PMC and members of the community so far. However, if there's an official
ASF stance on this topic, then I acknowledge that Apache Kafka should
align
with it.

Best,

Chris


On Thu, Jan 11, 2024, 14:50 fpapon  wrote:


Hi Justine,

I'm not sure to see the difference between "happy users" and vendors
that advertise their products in some of the company list in the
"powered by" page.

Btw, my initial purpose of my proposal was to help user to find support
for production stuff rather than searching in google.

I don't think this is a bad thing because this is something that
already
exist in many ASF projects like:

https://hop.apache.org/community/commercial/
https://struts.apache.org/commercial-support.html
https://directory.apache.org/commercial-support.html
https://tomee.apache.org/commercial-support.html
https://plc4x.apache.org/users/commercial-support.html
https://camel.apache.org/community/support/
https://openmeetings.apache.org/commercial-support.html
https://guacamole.apache.org/support/



https://cwiki.apache.org/confluence/display/HADOOP2/Distributions+and+Commercial+Support
https://activemq.apache.org/supporthttps://karaf.apache.org/community.html

https://netbeans.apache.org/front/main/help/commercial-support/
https://royale.apache.org/royale-commercial-support/

https://karaf.apache.org/community.html

As I unders

Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-02-01 Thread Francois Visconte
Hi,

I see that the ticket has been left untouched since a while now.
Should it be included in the tiered storage v1?
We've observed that lacking a way to throttle uploads to tiered storage has
a major impact on
producers and consumers when tiered storage access recovers (starving disk
IOps/throughput or CPU).
For this reason, I think this is an important feature and possibly worth
including in v1?

Regards,


On Tue, Dec 5, 2023 at 8:43 PM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. A few comments.
>
> 10. remote.log.manager.write.quota.default:
> 10.1 For other configs, we
> use replica.alter.log.dirs.io.max.bytes.per.second. To be consistent,
> perhaps this can be sth like remote.log.manager.write.max.bytes.per.second.
> 10.2 Could we list the new metrics associated with the new quota.
> 10.3 Is this dynamically configurable? If so, could we document the impact
> to tools like kafka-configs.sh and AdminClient?
>
> Jun
>
> On Tue, Nov 28, 2023 at 2:19 AM Luke Chen  wrote:
>
> > Hi Abhijeet,
> >
> > Thanks for the KIP!
> > This is an important feature for tiered storage.
> >
> > Some comments:
> > 1. Will we introduce new metrics for this tiered storage quotas?
> > This is important because the admin can know the throttling status by
> > checking the metrics while the remote write/read are slow, like the rate
> of
> > uploading/reading byte rate, the throttled time for upload/read... etc.
> >
> > 2. Could you give some examples for the throttling algorithm in the KIP
> to
> > explain it? That will make it much clearer.
> >
> > 3. To solve this problem, we can break down the RLMTask into two smaller
> > tasks - one for segment upload and the other for handling expired
> segments.
> > How do we handle the situation when a segment is still waiting for
> > offloading while this segment is expired and eligible to be deleted?
> > Maybe it'll be easier to not block the RLMTask when quota exceeded, and
> > just check it each time the RLMTask runs?
> >
> > Thank you.
> > Luke
> >
> > On Wed, Nov 22, 2023 at 6:27 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > wrote:
> >
> > > Hi All,
> > >
> > > I have created KIP-956 for defining read and write quota for tiered
> > > storage.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > >
> > > Feedback and suggestions are welcome.
> > >
> > > Regards,
> > > Abhijeet.
> > >
> >
>


Problematic new HWM increment behaviour introduced by KIP-207 and KIP-966

2024-09-11 Thread Francois Visconte
We identified a bug/new behaviour that would lead to consumer lagging for a
long time and ListOffsets requests failing during that time frame.

While the ListOffsets requests failure is expected and has been introduced
by KIP-207
,
the problematic behavior is more about the inability to increment
highWatermark and the consequence of having lagging consumers.


Here is the situation


   -

   We have a topic with min.isr=2


   -

   We have a partition on broker 16, 17 and 18
   -

   Leader for this partition is broker 17




   1.

   Broker 18 failed. Partition has 2 ISRs
   2.

   Broker 16 failed. Partition has 1 ISR (17)
   3.

   Broker 7 has LEO higher than HWM:

[Broker id=17] Leader topic-86 with topic id Some(yFhPOnPsRDiYHgfF2bR2aQ)
starts at leader epoch 7 from offset 3067193660 with partition epoch 11,
high watermark 3067191497, ISR [10017], adding replicas [] and removing
replicas [] (under-min-isr). Previous leader Some(10017) and previous
leader epoch was 6.

At this point producers cannot produce to topic-86 partition because there
is only one ISR, which is expected behavior.

But it seems that KIP-207 prevent answering to ListOffsets requests here



// Only consider throwing an error if we get a client request
(isolationLevel is defined) and the high watermark

// is lagging behind the start offset

val maybeOffsetsError: Option[ApiException] = leaderEpochStartOffsetOpt

  .filter(epochStart => isolationLevel.isDefined && epochStart >
localLog.highWatermark)

  .map(epochStart => Errors.OFFSET_NOT_AVAILABLE.exception(s"Failed to
fetch offsets for " +

s"partition $topicPartition with leader $epochLogString as this
partition's " +

s"high watermark (${localLog.highWatermark}) is lagging behind the " +

s"start offset from the beginning of this epoch ($epochStart)."))


It seems that the path to get to the HWM being stuck for so long was
introduced in preparation of KIP-966
,
see this ticket  and PR
.

As a result:

   -

   The stuck HWM in the above scenario can also mean that a small part of
   messages isn't readable by consumers even though it was in the past.
   -

   In case of truncation, the HWM might still go backwards. This is still
   possible even with min.ISR, although it should be rare.



Regards, F.


Re: [VOTE] KIP-1058: Txn consumer exerts pressure on remote storage when reading non-txn topic

2024-09-30 Thread Francois Visconte
Could we vote on this? This is causing a bunch of tiered storage read
issues as many consumers default to READ_COMMITTED (eg. librdkafka)

Thanks,
F.

On Mon, Sep 16, 2024 at 7:20 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Bumping this thread for vote. PTAL.
>
> On Mon, Sep 9, 2024 at 2:01 PM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'd like to open voting for KIP-1058. This KIP improves the consumer
> > reading from remote storage when READ_COMMITTED isolation level is
> enabled.
> > PTAL.
> >
> > KIP-1058
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1058%3A+Txn+consumer+exerts+pressure+on+remote+storage+when+collecting+aborted+transactions
> >
> >
> > Thanks,
> > Kamal
> >
>


[jira] [Resolved] (KAFKA-13872) Partitions are truncated when leader is replaced

2023-08-31 Thread Francois Visconte (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Visconte resolved KAFKA-13872.
---
Resolution: Won't Fix

transitioning to won't fix as this seems the expected behaviour.

> Partitions are truncated when leader is replaced
> 
>
> Key: KAFKA-13872
> URL: https://issues.apache.org/jira/browse/KAFKA-13872
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 2.7.2
>Reporter: Francois Visconte
>Priority: Major
> Attachments: extract-2022-05-04T15_50_34.110Z.csv
>
>
> Sample setup:
>  * a topic with one partition and RF=3
>  * a producer using acks=1
>  * min.insync.replicas to 1
>  * 3 brokers 1,2,3
>  * Preferred leader of the partition is brokerId 0
>  
> Steps to reproduce the issue
>  * Producer keeps producing to the partition, leader is brokerId=0
>  * At some point, replicas 1 and 2 are falling behind and removed from the ISR
>  * The leader broker 0 has an hardware failure
>  * Partition transition to offline
>  * This leader is replaced with a new broker with an empty disk and the same 
> broker id 0
>  * Partition transition from offline to online with leader 0, ISR = 0
>  * Followers see the leader offset is 0 and decide to truncate their 
> partitions to 0, ISR=0,1,2
>  * At this point all the topic data has been removed from all replicas and 
> partition size drops to 0 on all replicas
> Attached some of the relevant logs. I can provide more logs if necessary



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15477) Kafka won't shutdown when deleting remote segment

2023-09-19 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-15477:
-

 Summary: Kafka won't shutdown when deleting remote segment
 Key: KAFKA-15477
 URL: https://issues.apache.org/jira/browse/KAFKA-15477
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 3.6.0
Reporter: Francois Visconte


When brokers are busy deleting a bunch of segments (following a topic removal 
using tiered storage), brokers won't respond to sigterm signal and cleanly 
shutdown. 
Intead, they keep removing remote segment until it's fully completed (which can 
take time for topics with long retention).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15525) Segment uploads stop working following a broker failure

2023-10-02 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-15525:
-

 Summary: Segment uploads stop working following a broker failure
 Key: KAFKA-15525
 URL: https://issues.apache.org/jira/browse/KAFKA-15525
 Project: Kafka
  Issue Type: Bug
  Components: Tiered-Storage
Affects Versions: 3.6.0
Reporter: Francois Visconte


I have a tiered-storage enabled cluster and topic where I continuously produce 
and consume to/from a TS-enabled topic on that cluster.

Here are the topic settings I’m using: 

{code:java}
local.retention.ms=12
remote.storage.enable=true
retention.ms: 1080
segment.bytes: 51200
{code}
Here are my broker settings:
{code:java}
remote.log.storage.system.enable=true
remote.log.storage.manager.class.path=/opt/kafka/tiered-storage-libs/*
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=INTERNAL_PLAINTEXT
remote.log.manager.task.interval.ms=5000
remote.log.manager.thread.pool.size=10
remote.log.reader.threads=10
remote.log.reader.max.pending.tasks=100
rlmm.config.remote.log.metadata.topic.replication.factor=1
rlmm.config.remote.log.metadata.topic.num.partitions=50
rlmm.config.remote.log.metadata.topic.retention.ms=-1
rsm.config.chunk.cache.class=io.aiven.kafka.tieredstorage.chunkmanager.cache.DiskBasedChunkCache
rsm.config.chunk.cache.path=/data/tiered-storage-cache
rsm.config.chunk.cache.size=1073741824
rsm.config.metrics.recording.level=DEBUG    
rsm.config.storage.aws.credentials.provider.class=software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider
rsm.config.storage.backend.class.name=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.region=us-east-1
rsm.config.chunk.size=102400
rsm.config.storage.s3.multipart.upload.part.size=16777216 {code}
When a broker in the cluster get rotated (replaced or restarted) some brokers 
start throwing this error repeatedly: 
{code:java}
[RemoteLogManager=1 partition=yTypIvtBRY2l3sD4-8M7fA:loadgen-3] Error 
occurred while copying log segments of partition: 
yTypIvtBRY2l3sD4-8M7fA:loadgen-3 

java.util.concurrent.ExecutionException: 
org.apache.kafka.common.KafkaException: java.util.concurrent.TimeoutException: 
Timed out in catching up with the expected offset by consumer.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
kafka.log.remote.RemoteLogManager$RLMTask.copyLogSegment(RemoteLogManager.java:728)
    at 
kafka.log.remote.RemoteLogManager$RLMTask.copyLogSegmentsToRemote(RemoteLogManager.java:687)
    at kafka.log.remote.RemoteLogManager$RLMTask.run(RemoteLogManager.java:790)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
    at 
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
    at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.kafka.common.KafkaException: 
java.util.concurrent.TimeoutException: Timed out in catching up with the 
expected offset by consumer.
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.lambda$storeRemoteLogMetadata$0(TopicBasedRemoteLogMetadataManager.java:188)
    at 
java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718)
    at 
java.base/java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:483)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
    at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
    at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
    at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: java.util.concurrent.TimeoutException: Timed out in catching up with 
the expected offset by consumer.
    at 
org.apache.kafka.server.log.remote.metadata.storage.ConsumerManager.waitTillConsumptionCatchesUp(ConsumerManager.java:121)
    at 
org.apache.kafka.server.log.remote.metadata.storage.ConsumerManager.waitTillConsumptionCatchesUp(ConsumerManager.java:89)
    at

[jira] [Created] (KAFKA-15802) Trying to access uncopied segments metadata on listOffsets

2023-11-09 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-15802:
-

 Summary: Trying to access uncopied segments metadata on listOffsets
 Key: KAFKA-15802
 URL: https://issues.apache.org/jira/browse/KAFKA-15802
 Project: Kafka
  Issue Type: Bug
  Components: Tiered-Storage
Affects Versions: 3.6.0
Reporter: Francois Visconte


We have a tiered storage cluster running with Aiven s3 plugin. 

On our cluster, we have a process doing regular listOffsets requests. 

This trigger a tiered storage exception:
{panel}
org.apache.kafka.common.KafkaException: 
org.apache.kafka.server.log.remote.storage.RemoteResourceNotFoundException: 
Requested remote resource was not found
at 
org.apache.kafka.storage.internals.log.RemoteIndexCache.lambda$createCacheEntry$6(RemoteIndexCache.java:355)
at 
org.apache.kafka.storage.internals.log.RemoteIndexCache.loadIndexFile(RemoteIndexCache.java:318)
Nov 09, 2023 1:42:01 PM com.github.benmanes.caffeine.cache.LocalAsyncCache 
lambda$handleCompletion$7
WARNING: Exception thrown during asynchronous load
java.util.concurrent.CompletionException: 
io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key 
kafka-evp-ts-988a/tiered_storage_test_normal_48e5-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest
 does not exists in storage 
S3Storage\{bucketName='dd-kafka-tiered-storage-staging-us1-staging-dog', 
partSize=16777216}
at 
com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:107)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768)
at 
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760)
at 
java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at 
java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
at 
java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
at 
java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
at 
java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
Caused by: io.aiven.kafka.tieredstorage.storage.KeyNotFoundException: Key 
kafka-evp-ts-988a/tiered_storage_test_normal_48e5-0A_3phS5QWu9eU28KG0Lxg/24/00149691-Rdf4cUR_S4OYAGImco6Lbg.rsm-manifest
 does not exists in storage 
S3Storage\{bucketName='dd-kafka-tiered-storage-staging-us1-staging-dog', 
partSize=16777216}
at 
io.aiven.kafka.tieredstorage.storage.s3.S3Storage.fetch(S3Storage.java:80)
at 
io.aiven.kafka.tieredstorage.manifest.SegmentManifestProvider.lambda$new$1(SegmentManifestProvider.java:59)
at 
com.github.benmanes.caffeine.cache.CacheLoader.lambda$asyncLoad$0(CacheLoader.java:103)
... 7 more
Caused by: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The 
specified key does not exist. (Service: S3, Status Code: 404, Request ID: 
CFMP27PVC9V2NNEM, Extended Request ID: 
F5qqlV06qQJ5qCuWl91oueBaha0QLMBURJudnOnFDQk+YbgFcAg70JBATcARDxN44DGo+PpfZHAsum+ioYMoOw==)
at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)
at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:82)
at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:60)
at 
software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:41)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage

[jira] [Created] (KAFKA-16890) Failing to build aux state on broker failover

2024-06-04 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-16890:
-

 Summary: Failing to build aux state on broker failover
 Key: KAFKA-16890
 URL: https://issues.apache.org/jira/browse/KAFKA-16890
 Project: Kafka
  Issue Type: Bug
  Components: Tiered-Storage
Affects Versions: 3.7.0, 3.7.1
Reporter: Francois Visconte


We have clusters where we replace machines often falling into a state where we 
keep having "Error building remote log auxiliary state for loadtest_topic-22" 
and the partition being under-replicated until the leader is manually 
restarted. 

Looking into a specific case, here is what we observed in __remote_log_metadata 
topic:


{code:java}
 
partition: 29, offset: 183593, value: 
RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=GZeRTXLMSNe2BQjRXkg6hQ}, startOffset=10823, endOffset=11536, 
brokerId=10013, maxTimestampMs=1715774588597, eventTimestampMs=1715781657604, 
segmentLeaderEpochs={125=10823, 126=10968, 128=11047, 130=11048, 131=11324, 
133=11442, 134=11443, 135=11445, 136=11521, 137=11533, 139=11535}, 
segmentSizeInBytes=704895, customMetadata=Optional.empty, 
state=COPY_SEGMENT_STARTED}
partition: 29, offset: 183594, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
state=COPY_SEGMENT_FINISHED, eventTimestampMs=1715781658183, brokerId=10013}
partition: 29, offset: 183669, value: 
RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=L1TYzx0lQkagRIF86Kp0QQ}, startOffset=10823, endOffset=11544, 
brokerId=10008, maxTimestampMs=1715781445270, eventTimestampMs=1715782717593, 
segmentLeaderEpochs={125=10823, 126=10968, 128=11047, 130=11048, 131=11324, 
133=11442, 134=11443, 135=11445, 136=11521, 137=11533, 139=11535, 140=11537, 
142=11543}, segmentSizeInBytes=713088, customMetadata=Optional.empty, 
state=COPY_SEGMENT_STARTED}
partition: 29, offset: 183670, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
state=COPY_SEGMENT_FINISHED, eventTimestampMs=1715782718370, brokerId=10008}
partition: 29, offset: 186215, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1715867874617, brokerId=10008}
partition: 29, offset: 186216, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1715867874725, brokerId=10008}
partition: 29, offset: 186217, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1715867874729, brokerId=10008}
partition: 29, offset: 186218, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1715867874817, brokerId=10008}
{code}
 

It seems that at the time the leader is restarted (10013), a second copy of the 
same segment is tiered by the new leader (10008). Interestingly the segment 
doesn't have the same end offset, which is concerning. 

Then the follower sees the following error repeatedly until the leader is 
restarted: 



 
{code:java}
[2024-05-17 20:46:42,133] DEBUG [ReplicaFetcher replicaId=10013, 
leaderId=10008, fetcherId=0] Handling errors in processFetchRequest for 
partitions HashSet(loadtest_topic-22) (kafka.server.ReplicaFetcherThread)
[2024-05-17 20:46:43,174] DEBUG [ReplicaFetcher replicaId=10013, 
leaderId=10008, fetcherId=0] Received error OFFSET_MOVED_TO_TIERED_STORAGE, at 
fetch offset: 11537, topic-partition: loadtest_topic-22 
(kafka.server.ReplicaFetcherThread)
[2024-05-17 20:46:43,175] ERROR [ReplicaFetcher replicaId=10013, 
leaderId=10008, fetcherId=0] Error building remote log auxiliary state for 
loadtest_topic-22 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.server.log.remote.storage.RemoteStorageException: Couldn't 
build the state from remote store for partition: loadtest_topic-22, 
currentLeaderEpoch: 153, leaderLocalLogStartOffset: 11545, 
leaderLogStartOffset: 11537, epoch: 142as the previous remote log segment 
metadata was 

[jira] [Created] (KAFKA-16895) RemoteCopyLagSegments metric taking active segment into account

2024-06-05 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-16895:
-

 Summary: RemoteCopyLagSegments metric taking active segment into 
account
 Key: KAFKA-16895
 URL: https://issues.apache.org/jira/browse/KAFKA-16895
 Project: Kafka
  Issue Type: Bug
  Components: Tiered-Storage
Affects Versions: 3.7.0, 3.7.1
Reporter: Francois Visconte


The RemoteCopyLagSegment is off by 1 because it also includes the active 
segment into account while the RemoteCopyLagBytes does substract the size of 
active segment: 

   
{code:java}
 long bytesLag = log.onlyLocalLogSegmentsSize() - log.activeSegment().size();
String topic = topicIdPartition.topic();
int partition = topicIdPartition.partition();
long segmentsLag = log.onlyLocalLogSegmentsCount();
brokerTopicStats.recordRemoteCopyLagBytes(topic, partition, bytesLag);
brokerTopicStats.recordRemoteCopyLagSegments(topic, partition, segmentsLag);
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17184) Remote index cache noisy logging

2024-07-23 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-17184:
-

 Summary: Remote index cache noisy logging
 Key: KAFKA-17184
 URL: https://issues.apache.org/jira/browse/KAFKA-17184
 Project: Kafka
  Issue Type: Bug
Reporter: Francois Visconte


We have a tiered storage cluster where some consumers are constantly lagging 
behind. 

On this cluster, we get a ton of error logs and fail fetches with the following 
symptom: 


{code:java}
java.lang.IllegalStateException: This entry is marked for cleanup
at 
org.apache.kafka.storage.internals.log.RemoteIndexCache$Entry.lookupOffset(RemoteIndexCache.java:569)
at 
org.apache.kafka.storage.internals.log.RemoteIndexCache.lookupOffset(RemoteIndexCache.java:446)
at 
kafka.log.remote.RemoteLogManager.lookupPositionForOffset(RemoteLogManager.java:1445)
at kafka.log.remote.RemoteLogManager.read(RemoteLogManager.java:1391)
at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:62)
at kafka.log.remote.RemoteLogReader.call(RemoteLogReader.java:31)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
{code}

I believe this should be handled differently:

* Log should be warn or info
* We should reload the index when an offset is requested and the entry is 
marked for cleanup. 


We do use the default setting for 
{{remote.log.index.file.cache.total.size.bytes}} (1GiB).







--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16511) Leaking tiered segments

2024-04-11 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-16511:
-

 Summary: Leaking tiered segments
 Key: KAFKA-16511
 URL: https://issues.apache.org/jira/browse/KAFKA-16511
 Project: Kafka
  Issue Type: Bug
  Components: Tiered-Storage
Affects Versions: 3.7.0
Reporter: Francois Visconte


I have some topics there were not written since a few days (having 12h 
retention) where some data remains on tiered storage (in our case S3) and they 
are never deleted.

 

Looking at the log history, it appears that we never even tried to delete these 
segments: 

When looking at one of the non-leaking segment, I get the following interesting 
messages: 

```

"2024-04-02T10:30:45.265Z","""kafka""","""10039""","[RemoteLogManager=10039 
partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Deleted remote log segment 
RemoteLogSegmentId\{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, 
id=fqGng3UURCG3-v4lETeLKQ} due to leader-epoch-cache truncation. Current 
earliest-epoch-entry: EpochEntry(epoch=8, startOffset=2980106), 
segment-end-offset: 2976819 and segment-epochs: [5]"

"2024-04-02T10:30:45.242Z","""kafka""","""10039""","Deleting log segment data 
for completed successfully 
RemoteLogSegmentMetadata\{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764,
 id=fqGng3UURCG3-v4lETeLKQ}, startOffset=2968418, endOffset=2976819, 
brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, 
segmentLeaderEpochs=\{5=2968418}, segmentSizeInBytes=536351075, 
customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}"

"2024-04-02T10:30:45.144Z","""kafka""","""10039""","Deleting log segment data 
for 
RemoteLogSegmentMetadata\{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764,
 id=fqGng3UURCG3-v4lETeLKQ}, startOffset=2968418, endOffset=2976819, 
brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013411147, 
segmentLeaderEpochs=\{5=2968418}, segmentSizeInBytes=536351075, 
customMetadata=Optional.empty, state=COPY_SEGMENT_FINISHED}"

"2024-04-01T23:16:51.157Z","""kafka""","""10029""","[RemoteLogManager=10029 
partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764] Copied 
02968418.log to remote storage with segment-id: 
RemoteLogSegmentId\{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764, 
id=fqGng3UURCG3-v4lETeLKQ}"

"2024-04-01T23:16:51.147Z","""kafka""","""10029""","Copying log segment data 
completed successfully, metadata: 
RemoteLogSegmentMetadata\{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764,
 id=fqGng3UURCG3-v4lETeLKQ}, startOffset=2968418, endOffset=2976819, 
brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, 
segmentLeaderEpochs=\{5=2968418}, segmentSizeInBytes=536351075, 
customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}"

"2024-04-01T23:16:37.328Z","""kafka""","""10029""","Copying log segment data, 
metadata: 
RemoteLogSegmentMetadata\{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-764,
 id=fqGng3UURCG3-v4lETeLKQ}, startOffset=2968418, endOffset=2976819, 
brokerId=10029, maxTimestampMs=1712009754536, eventTimestampMs=1712013397319, 
segmentLeaderEpochs=\{5=2968418}, segmentSizeInBytes=536351075, 
customMetadata=Optional.empty, state=COPY_SEGMENT_STARTED}"

```

Which looks right because we can see logs from both the plugin and remote log 
manager indicating that the remote log segment was removed. 

Now if I look on one of the leaked segment, here is what I see

 

```

"2024-04-02T00:43:33.834Z","""kafka""","""10001""","[RemoteLogManager=10001 
partition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765] Copied 
02971163.log to remote storage with segment-id: 
RemoteLogSegmentId\{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765, 
id=8dP13VDYSaiFlubl9SNBTQ}"

"2024-04-02T00:43:33.822Z","""kafka""","""10001""","Copying log segment data 
completed successfully, metadata: 
RemoteLogSegmentMetadata\{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=5G8Ai8kBSwmQ3Ln4QRY5rA:topic1_3543-765,
 id=8dP13VDYSaiFlubl9SNBTQ}, startOffset=2971163, endOffset=2978396, 
brokerId=10001, maxTimestampMs=1712010648756, eventTimestampMs=17

[jira] [Created] (KAFKA-10099) Kerberos authentication sets java authrizedId to authenticationId not autherizationId

2020-06-03 Thread Francois Fernando (Jira)
Francois Fernando created KAFKA-10099:
-

 Summary: Kerberos authentication sets java authrizedId to 
authenticationId not autherizationId
 Key: KAFKA-10099
 URL: https://issues.apache.org/jira/browse/KAFKA-10099
 Project: Kafka
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Francois Fernando


Following authentication code in kafka still puzzles me (Lines 67-74: 
https://github.com/apache/kafka/blob/3cdc78e6bb1f83973a14ce1550fe3874f7348b05/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslServerCallbackHandler.java).


{{private void handleAuthorizeCallback(AuthorizeCallback ac) {}}
{{  String authenticationID = ac.getAuthenticationID();}}
{{  String authorizationID = ac.getAuthorizationID();}}

{{  LOG.info("Successfully authenticated client: authenticationID={}; 
authorizationID={}.",}}
{{ authenticationID, authorizationID);}}

{{  ac.setAuthorized(true);}}
{{  ac.setAuthorizedID(authenticationID);}}
{{}}}

In a kafka cluster secured with Kerberos, using a kafka keytab with principal 
like `sys_read/reader.myorg.c...@myorg.corp` results in:

authenticationID = sys_r...@myorg.corp;
authorizationID = sys_read/reader.myorg.c...@myorg.corp

Last line of above method sets the authorizedID to authenticationID not 
authorizationID. From my understanding of java security, the principal will 
become what's set in AuthorizedID.

This means the ACL definitions can't use the full principal string as the 
principal as authorizer will never see it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13872) Partitions are truncated when leader is replaced

2022-05-04 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-13872:
-

 Summary: Partitions are truncated when leader is replaced
 Key: KAFKA-13872
 URL: https://issues.apache.org/jira/browse/KAFKA-13872
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.7.2
Reporter: Francois Visconte
 Attachments: extract-2022-05-04T15_50_34.110Z.csv

Sample setup:
 * a topic with one partition and RF=3
 * a producer using acks=1
 * min.insync.replicas to 1
 * 3 brokers 1,2,3
 * Preferred leader of the partition is brokerId 0

 

Steps to reproduce the issue
 * Producer keeps producing to the partition, leader is brokerId=0
 * At some point, replicas 1 and 2 are falling behind and removed from the ISR
 * The leader broker 0 has an hardware failure
 * Partition transition to offline
 * This leader is replaced with a new broker with an empty disk and the same 
broker id 0
 * Partition transition from offline to online with leader 0, ISR = 0
 * Followers see the leader offset is 0 and decide to truncate their partitions 
to 0, ISR=0,1,2

Attached some of the relevant logs. I can provide more logs if necessary



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-2624) Truncate warn message logged after truncating partitions

2015-10-08 Thread Francois Visconte (JIRA)
Francois Visconte created KAFKA-2624:


 Summary: Truncate warn message logged after truncating partitions
 Key: KAFKA-2624
 URL: https://issues.apache.org/jira/browse/KAFKA-2624
 Project: Kafka
  Issue Type: Bug
  Components: replication
Affects Versions: 0.8.2.1, 0.8.2.0, 0.8.1.1, 0.8.1, 0.8.0, 0.7.2, 0.7.1, 
0.7, 0.6, 0.8.1.2, 0.9.0.0, 0.10.0.0, 0.8.2.2, 0.9.0.1
Reporter: Francois Visconte
Assignee: Neha Narkhede
 Fix For: 0.8.1.2, 0.9.0.0, 0.10.0.0, 0.8.2.2, 0.9.0.1, 0.8.2.1, 
0.8.2.0, 0.8.1.1, 0.8.1, 0.8.0, 0.7.2, 0.7.1, 0.7, 0.6


Message warning about truncating is logged after log has been truncated. 
Consequently, logged message is always of the form

Replica 13 for partition [topic_x,24] reset its fetch offset from 1234 to 
current leader 22's latest offset 1234 (kafka.server.ReplicaFetcherThread)






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2624) Truncate warn message logged after truncating partitions

2015-10-08 Thread Francois Visconte (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Visconte updated KAFKA-2624:
-
Priority: Trivial  (was: Major)

> Truncate warn message logged after truncating partitions
> 
>
> Key: KAFKA-2624
> URL: https://issues.apache.org/jira/browse/KAFKA-2624
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.6, 0.7, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.8.1.1, 0.8.1.2, 
> 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.8.2.2, 0.9.0.1
>Reporter: Francois Visconte
>Assignee: Neha Narkhede
>Priority: Trivial
> Fix For: 0.6, 0.7, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.8.1.1, 0.8.1.2, 
> 0.8.2.0, 0.8.2.1, 0.9.0.0, 0.10.0.0, 0.8.2.2, 0.9.0.1
>
>
> Message warning about truncating is logged after log has been truncated. 
> Consequently, logged message is always of the form
> Replica 13 for partition [topic_x,24] reset its fetch offset from 1234 to 
> current leader 22's latest offset 1234 (kafka.server.ReplicaFetcherThread)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-17678) Problematic new HWM increment behaviour introduced by KIP-207 and KIP-966

2024-10-02 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-17678:
-

 Summary: Problematic new HWM increment behaviour introduced by 
KIP-207 and KIP-966
 Key: KAFKA-17678
 URL: https://issues.apache.org/jira/browse/KAFKA-17678
 Project: Kafka
  Issue Type: Bug
  Components: replication
Reporter: Francois Visconte


We identified a bug/new behaviour that would lead to consumer lagging for a 
long time and ListOffsets requests failing during that time frame.

While the ListOffsets requests failure is expected and has been introduced by 
KIP-207, the problematic behavior is more about the inability to increment 
highWatermark and the consequence of having lagging consumers.

Here is the situation
 * We have a topic with min.isr=2
 * We have a partition on broker 16, 17 and 18
 * Leader for this partition is broker 17

 # Broker 18 failed. Partition has 2 ISRs
 # Broker 16 failed. Partition has 1 ISR (17)
 # Broker 7 has LEO higher than HWM:
{{[Broker id=17] Leader topic-86 with topic id Some(yFhPOnPsRDiYHgfF2bR2aQ) 
starts at leader epoch 7 from offset 3067193660 with partition epoch 11, high 
watermark 3067191497, ISR [10017], adding replicas [] and removing replicas [] 
(under-min-isr). Previous leader Some(10017) and previous leader epoch was 6.}}

At this point producers cannot produce to topic-86 partition because there is 
only one ISR, which is expected behavior.

But it seems that KIP-207 prevent answering to ListOffsets requests here

 
{code:java}
// Only consider throwing an error if we get a client request (isolationLevel 
is defined) and the high watermark
// is lagging behind the start offset
val maybeOffsetsError: Option[ApiException] = leaderEpochStartOffsetOpt
.filter(epochStart => isolationLevel.isDefined && epochStart > 
localLog.highWatermark)
.map(epochStart => Errors.OFFSET_NOT_AVAILABLE.exception(s"Failed to fetch 
offsets for " +
s"partition $topicPartition with leader $epochLogString as this partition's " +
s"high watermark (${localLog.highWatermark}) is lagging behind the " +
s"start offset from the beginning of this epoch ($epochStart).")){code}
It seems that the path to get to the HWM being stuck for so long was introduced 
in preparation of KIP-966, see this ticket and PR.

As a result:
 * The stuck HWM in the above scenario can also mean that a small part of 
messages isn't readable by consumers even though it was in the past.
 * In case of truncation, the HWM might still go backwards. This is still 
possible even with min.ISR, although it should be rare.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-1889) Refactor shell wrapper scripts

2019-06-19 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved KAFKA-1889.
---
Resolution: Won't Fix

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-21 Thread Francois Saint-Jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated KAFKA-1889:
--
Attachment: refactor-scripts-v1.patch

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>    Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-21 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created KAFKA-1889:
-

 Summary: Refactor shell wrapper scripts
 Key: KAFKA-1889
 URL: https://issues.apache.org/jira/browse/KAFKA-1889
 Project: Kafka
  Issue Type: Improvement
  Components: packaging
Reporter: Francois Saint-Jacques
Priority: Minor


Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-21 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286006#comment-14286006
 ] 

Francois Saint-Jacques edited comment on KAFKA-1889 at 1/21/15 6:21 PM:


I have multiple other comments on the scripts that I didn't address and might 
be worth talking.

1. There seems to be many way to pass option to kafka-run-class.sh, either by 
arguments (-daemon|-loggc|...) or by environment variables 
(KAFKA_JMX_OPTS|KAFKA_OPTS|KAFKA_HEAP_OPTS|...). This is inconsistent and needs 
to be addressed.
2. Scripts shouldn't bother daemonizing, leave this to packagers, just make 
sure you exec correctly.
3. The defaults are not production ready for servers:
 - gc log shouldn't be enabled by default
 - kafka-request.log to TRACE, this is a silent disk killer on busy cluster
 - never do this in non-init script, should be left to packagers: if [ ! -d 
"${LOG_DIR}" ]; then mkdir -p "${LOG_DIR}"; fi


was (Author: fsaintjacques):
I have multiple other comments on the scripts that I didn't address and might 
be worth talking.

1. There seems to be many way to pass option to kafka-run-class.sh, either by 
arguments (-daemon|-loggc|...) or by environment variables 
(KAFKA_JMX_OPTS|KAFKA_OPTS|KAFKA_HEAP_OPTS|...). This is inconsistent and needs 
to be addressed.
2. Scripts shouldn't bother daemonizing, leave this to packagers, just make 
sure you exec correctly.
3. The defaults are not production ready for servers:
 -gc log shouldn't be enabled by default
 -kafka-request.log to TRACE, this is a silent disk killer on busy cluster
 - never do this in non-init script, should be left to packagers: if [ ! -d 
"${LOG_DIR}" ]; then mkdir -p "${LOG_DIR}"; fi

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>      Issue Type: Improvement
>      Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-21 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286006#comment-14286006
 ] 

Francois Saint-Jacques commented on KAFKA-1889:
---

I have multiple other comments on the scripts that I didn't address and might 
be worth talking.

1. There seems to be many way to pass option to kafka-run-class.sh, either by 
arguments (-daemon|-loggc|...) or by environment variables 
(KAFKA_JMX_OPTS|KAFKA_OPTS|KAFKA_HEAP_OPTS|...). This is inconsistent and needs 
to be addressed.
2. Scripts shouldn't bother daemonizing, leave this to packagers, just make 
sure you exec correctly.
3. The defaults are not production ready for servers:
 -gc log shouldn't be enabled by default
 -kafka-request.log to TRACE, this is a silent disk killer on busy cluster
 - never do this in non-init script, should be left to packagers: if [ ! -d 
"${LOG_DIR}" ]; then mkdir -p "${LOG_DIR}"; fi

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>    Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-21 Thread Francois Saint-Jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated KAFKA-1889:
--
Assignee: Francois Saint-Jacques
  Status: Patch Available  (was: Open)

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>    Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-21 Thread Francois Saint-Jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated KAFKA-1889:
--
Attachment: refactor-scripts-v2.patch

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>    Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-21 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286097#comment-14286097
 ] 

Francois Saint-Jacques commented on KAFKA-1889:
---

The second patch should give an overview of what a 'clean' kafka-run-class.sh 
should look like.

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-21 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286097#comment-14286097
 ] 

Francois Saint-Jacques edited comment on KAFKA-1889 at 1/21/15 7:18 PM:


The second patch should give an overview of what a 'clean' kafka-run-class.sh 
should look like. This will allow packagers to provide easily configurable 
defaults via /etc/default/kafka (on debian-based system) or 
/etc/sysconfig/kafka (on RHEL-based system).


was (Author: fsaintjacques):
The second patch should give an overview of what a 'clean' kafka-run-class.sh 
should look like.

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>      Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-22 Thread Francois Saint-Jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated KAFKA-1889:
--
Status: Open  (was: Patch Available)

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>    Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-22 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288120#comment-14288120
 ] 

Francois Saint-Jacques commented on KAFKA-1889:
---

My goal is to fix the kafka startup script to make it friendlier to package as 
rpm or deb. I believe it is a disjoint concern than the issue you mentioned.

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-22 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288193#comment-14288193
 ] 

Francois Saint-Jacques commented on KAFKA-1889:
---

It's not about working well with rpm and debian, it's about having sane default 
and behaviour so that packagers won't have to diff/patch said scripts.

My patch is not final at all, I just wanted to start the discussion on cleaning 
the scripts.

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-22 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288205#comment-14288205
 ] 

Francois Saint-Jacques commented on KAFKA-1889:
---

Also note that this patch keeps the existing behaviour (to some epsilon: var 
rename + default jmx port which will break multiple process). It is mostly a 
cleanup of "kafka-run-class.sh" that acquired technical debts over time.

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1889) Refactor shell wrapper scripts

2015-01-26 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291967#comment-14291967
 ] 

Francois Saint-Jacques commented on KAFKA-1889:
---

Do you think it could make it to 0.8.2?

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1367) Broker topic metadata not kept in sync with ZooKeeper

2015-02-06 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309966#comment-14309966
 ] 

Francois Saint-Jacques commented on KAFKA-1367:
---

Instead of silently removing the field, could the controller force a cache 
refresh on a metadata request?

> Broker topic metadata not kept in sync with ZooKeeper
> -
>
> Key: KAFKA-1367
> URL: https://issues.apache.org/jira/browse/KAFKA-1367
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Ryan Berdeen
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1367.txt
>
>
> When a broker is restarted, the topic metadata responses from the brokers 
> will be incorrect (different from ZooKeeper) until a preferred replica leader 
> election.
> In the metadata, it looks like leaders are correctly removed from the ISR 
> when a broker disappears, but followers are not. Then, when a broker 
> reappears, the ISR is never updated.
> I used a variation of the Vagrant setup created by Joe Stein to reproduce 
> this with latest from the 0.8.1 branch: 
> https://github.com/also/kafka/commit/dba36a503a5e22ea039df0f9852560b4fb1e067c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1781) Readme should specify that Gradle 2.0 is required for initial bootstrap

2014-11-17 Thread Jean-Francois Im (JIRA)
Jean-Francois Im created KAFKA-1781:
---

 Summary: Readme should specify that Gradle 2.0 is required for 
initial bootstrap
 Key: KAFKA-1781
 URL: https://issues.apache.org/jira/browse/KAFKA-1781
 Project: Kafka
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.2
Reporter: Jean-Francois Im
Priority: Trivial


Current README.md says "You need to have gradle installed."

As the bootstrap procedure doesn't work with gradle 1.12, this needs to say 
that 2.0 or greater is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1781) Readme should specify that Gradle 2.0 is required for initial bootstrap

2014-11-17 Thread Jean-Francois Im (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Francois Im updated KAFKA-1781:

Attachment: gradle-2.0-readme.patch

Documentation patch that changes README.md to say "You need to have gradle 2.0 
or greater installed."

> Readme should specify that Gradle 2.0 is required for initial bootstrap
> ---
>
> Key: KAFKA-1781
> URL: https://issues.apache.org/jira/browse/KAFKA-1781
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2
>Reporter: Jean-Francois Im
>Priority: Trivial
> Fix For: 0.8.2
>
> Attachments: gradle-2.0-readme.patch
>
>
> Current README.md says "You need to have gradle installed."
> As the bootstrap procedure doesn't work with gradle 1.12, this needs to say 
> that 2.0 or greater is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1781) Readme should specify that Gradle 2.0 is required for initial bootstrap

2014-11-17 Thread Jean-Francois Im (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14214969#comment-14214969
 ] 

Jean-Francois Im commented on KAFKA-1781:
-

It doesn't seem to work with 1.8.

{quote}
$ rm -rf gradle/wrapper/
$ gradle -version


Gradle 1.8


Build time:   2013-09-24 07:32:33 UTC
Build number: none
Revision: 7970ec3503b4f5767ee1c1c69f8b4186c4763e3d

[snip]
$ gradle
[snip]
Building project 'core' with Scala version 2.10.1

FAILURE: Build failed with an exception.

* Where:
Build file '/home/jfim/projects/kafka/build.gradle' line: 199

* What went wrong:
A problem occurred evaluating root project 'kafka'.
> Could not create task of type 'ScalaDoc'.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 15.229 secs
$ ./gradlew
Error: Could not find or load main class org.gradle.wrapper.GradleWrapperMain
{quote}

This is what happens in 2.0. I also tested with 1.12, it does the same as 1.8.

{quote}
$ rm -rf gradle/wrapper/
$ gradle -version


Gradle 2.0


Build time:   2014-07-01 07:45:34 UTC
Build number: none
Revision: b6ead6fa452dfdadec484059191eb641d817226c

[snip]

$ gradle
Building project 'core' with Scala version 2.10.1
:downloadWrapper

BUILD SUCCESSFUL

Total time: 7.239 secs
$ ./gradlew
Building project 'core' with Scala version 2.10.1
:downloadWrapper UP-TO-DATE

BUILD SUCCESSFUL

Total time: 6.937 secs
{quote}

> Readme should specify that Gradle 2.0 is required for initial bootstrap
> ---
>
> Key: KAFKA-1781
> URL: https://issues.apache.org/jira/browse/KAFKA-1781
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2
>Reporter: Jean-Francois Im
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: gradle-2.0-readme.patch
>
>
> Current README.md says "You need to have gradle installed."
> As the bootstrap procedure doesn't work with gradle 1.12, this needs to say 
> that 2.0 or greater is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1781) Readme should specify that Gradle 2.0 is required for initial bootstrap

2014-11-17 Thread Jean-Francois Im (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215013#comment-14215013
 ] 

Jean-Francois Im commented on KAFKA-1781:
-

It is weird! This is what I get:

{quote}
$ git clone https://git-wip-us.apache.org/repos/asf/kafka.git KAFKA-1781
Cloning into 'KAFKA-1781'...
remote: Counting objects: 21794, done.
remote: Compressing objects: 100% (7216/7216), done.
remote: Total 21794 (delta 12925), reused 19667 (delta 11330)
Receiving objects: 100% (21794/21794), 15.17 MiB | 2.57 MiB/s, done.
Resolving deltas: 100% (12925/12925), done.
$ cd KAFKA-1781
$ git checkout -b 0.8.2 origin/0.8.2
Branch 0.8.2 set up to track remote branch 0.8.2 from origin.
Switched to a new branch '0.8.2'
$ gradle --version


Gradle 1.8


Build time:   2013-09-24 07:32:33 UTC
Build number: none
Revision: 7970ec3503b4f5767ee1c1c69f8b4186c4763e3d

Groovy:   1.8.6
Ant:  Apache Ant(TM) version 1.9.2 compiled on July 8 2013
Ivy:  2.2.0
JVM:  1.8.0_05 (Oracle Corporation 25.5-b02)
OS:   Linux 2.6.32-358.6.2.el6.x86_64 amd64

$ gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/1.8/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.1

FAILURE: Build failed with an exception.

* Where:
Build file '/home/jfim/projects/KAFKA-1781/build.gradle' line: 199

* What went wrong:
A problem occurred evaluating root project 'KAFKA-1781'.
> Could not create task of type 'ScalaDoc'.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 10.612 secs
{quote}

The differences between our setups seem to be JDK version (1.7.0_25 for you, 
1.8.0_05 on my end) and OS (Mac OS X vs Linux). 2.0 seems to work fine with the 
commands you use.

{quote}
$ rm -rf KAFKA-1781/
$ git clone https://git-wip-us.apache.org/repos/asf/kafka.git KAFKA-1781
Cloning into 'KAFKA-1781'...
remote: Counting objects: 21794, done.
remote: Compressing objects: 100% (7216/7216), done.
remote: Total 21794 (delta 12924), reused 19668 (delta 11330)
Receiving objects: 100% (21794/21794), 15.17 MiB | 2.74 MiB/s, done.
Resolving deltas: 100% (12924/12924), done.
$ cd KAFKA-1781
$ git checkout -b 0.8.2 origin/0.8.2
Branch 0.8.2 set up to track remote branch 0.8.2 from origin.
Switched to a new branch '0.8.2'
$ gradle --version


Gradle 2.0


Build time:   2014-07-01 07:45:34 UTC
Build number: none
Revision: b6ead6fa452dfdadec484059191eb641d817226c

Groovy:   2.3.3
Ant:  Apache Ant(TM) version 1.9.3 compiled on December 23 2013
JVM:  1.8.0_05 (Oracle Corporation 25.5-b02)
OS:   Linux 2.6.32-358.6.2.el6.x86_64 amd64

$ gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.0/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.1
:downloadWrapper

BUILD SUCCESSFUL

Total time: 11.341 secs
$ ./gradlew
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.0/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.1
:downloadWrapper UP-TO-DATE

BUILD SUCCESSFUL

Total time: 7.159 secs
{quote}

I don't mind amending the patch to reflect a lower version, but 2.0 and 2.2 
both appear to work on my end, while 1.8 and 1.12 don't.

> Readme should specify that Gradle 2.0 is required for initial bootstrap
> ---
>
> Key: KAFKA-1781
> URL: https://issues.apache.org/jira/browse/KAFKA-1781
>     Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2
>Reporter: Jean-Francois Im
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: gradle-2.0-readme.patch
>
>
> Current README.md says "You need to have gradle installed."
> As the bootstrap procedure doesn't work with gradle 1.12, this needs to say 
> that 2.0 or greater is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1781) Readme should specify that Gradle 2.0 is required for initial bootstrap

2014-11-17 Thread Jean-Francois Im (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215036#comment-14215036
 ] 

Jean-Francois Im commented on KAFKA-1781:
-

It seems to work with 1.8 and 1.12 if I switch the JDK to 1.7.0_51. Is JDK 8 
supported by Kafka?

> Readme should specify that Gradle 2.0 is required for initial bootstrap
> ---
>
> Key: KAFKA-1781
> URL: https://issues.apache.org/jira/browse/KAFKA-1781
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2
>    Reporter: Jean-Francois Im
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: gradle-2.0-readme.patch
>
>
> Current README.md says "You need to have gradle installed."
> As the bootstrap procedure doesn't work with gradle 1.12, this needs to say 
> that 2.0 or greater is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1781) Readme should specify that Gradle 2.0 is required for initial bootstrap

2014-11-17 Thread Jean-Francois Im (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215074#comment-14215074
 ] 

Jean-Francois Im commented on KAFKA-1781:
-

I think the two are related to JDK 8, but distinct issues. For example, running 
with JDK 8, gradle 1.8 and scala version 2.11:

{quote}
$ gradle -version


Gradle 1.8


Build time:   2013-09-24 07:32:33 UTC
Build number: none
Revision: 7970ec3503b4f5767ee1c1c69f8b4186c4763e3d

Groovy:   1.8.6
Ant:  Apache Ant(TM) version 1.9.2 compiled on July 8 2013
Ivy:  2.2.0
JVM:  1.8.0_05 (Oracle Corporation 25.5-b02)
OS:   Linux 2.6.32-358.6.2.el6.x86_64 amd64

$ gradle -PscalaVersion=2.11
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/1.8/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.11

FAILURE: Build failed with an exception.

* Where:
Build file '/home/jfim/projects/KAFKA-1781/build.gradle' line: 199

* What went wrong:
A problem occurred evaluating root project 'KAFKA-1781'.
> Could not create task of type 'ScalaDoc'.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 7.725 secs
{quote}

I think the takeaway for the wrapper download is that 2.0 is required if 
running on JDK 8 and that 1.8 (and potentially lower) work on JDK 7. The rest 
of the build is still broken with scala 2.10.1 on JDK 8, even when on gradle 
2.0.

I'm not sure what the proper resolution to this issue would be. Requiring 2.0 
seems rather safe but is only required on JDK 8 and does not fix KAFKA-1624. 
Perhaps adding a note that JDK 8 is not supported at this point in time is the 
proper resolution?

> Readme should specify that Gradle 2.0 is required for initial bootstrap
> ---
>
> Key: KAFKA-1781
> URL: https://issues.apache.org/jira/browse/KAFKA-1781
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2
>Reporter: Jean-Francois Im
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: gradle-2.0-readme.patch
>
>
> Current README.md says "You need to have gradle installed."
> As the bootstrap procedure doesn't work with gradle 1.12, this needs to say 
> that 2.0 or greater is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1781) Readme should specify that Gradle 2.0 is required for initial bootstrap

2014-11-17 Thread Jean-Francois Im (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215079#comment-14215079
 ] 

Jean-Francois Im commented on KAFKA-1781:
-

Also, see https://issues.gradle.org/browse/GRADLE-3094 which has been fixed in 
Gradle 2.0.

> Readme should specify that Gradle 2.0 is required for initial bootstrap
> ---
>
> Key: KAFKA-1781
> URL: https://issues.apache.org/jira/browse/KAFKA-1781
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.8.2
>    Reporter: Jean-Francois Im
>Priority: Blocker
> Fix For: 0.8.2
>
> Attachments: gradle-2.0-readme.patch
>
>
> Current README.md says "You need to have gradle installed."
> As the bootstrap procedure doesn't work with gradle 1.12, this needs to say 
> that 2.0 or greater is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1783) Missing slash in documentation for the Zookeeper paths in ZookeeperConsumerConnector

2014-11-18 Thread Jean-Francois Im (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Francois Im updated KAFKA-1783:

Attachment: kafka-missing-doc-slash.patch

Patch that adds the missing slash for the consumer id registry path.

> Missing slash in documentation for the Zookeeper paths in 
> ZookeeperConsumerConnector
> 
>
> Key: KAFKA-1783
> URL: https://issues.apache.org/jira/browse/KAFKA-1783
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>    Reporter: Jean-Francois Im
>Assignee: Neha Narkhede
>Priority: Trivial
> Attachments: kafka-missing-doc-slash.patch
>
>
> The documentation for the ZookeeperConsumerConnector refers to the consumer 
> id registry location as /consumers/[group_id]/ids[consumer_id], it should be 
> /consumers/[group_id]/ids/[consumer_id], as evidenced by 
> registerConsumerInZK() and TopicCount.scala line 61.
> A patch is provided that adds the missing forwards slash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1783) Missing slash in documentation for the Zookeeper paths in ZookeeperConsumerConnector

2014-11-18 Thread Jean-Francois Im (JIRA)
Jean-Francois Im created KAFKA-1783:
---

 Summary: Missing slash in documentation for the Zookeeper paths in 
ZookeeperConsumerConnector
 Key: KAFKA-1783
 URL: https://issues.apache.org/jira/browse/KAFKA-1783
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Jean-Francois Im
Assignee: Neha Narkhede
Priority: Trivial
 Attachments: kafka-missing-doc-slash.patch

The documentation for the ZookeeperConsumerConnector refers to the consumer id 
registry location as /consumers/[group_id]/ids[consumer_id], it should be 
/consumers/[group_id]/ids/[consumer_id], as evidenced by registerConsumerInZK() 
and TopicCount.scala line 61.

A patch is provided that adds the missing forwards slash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1081) kafka-run-class.sh is broken

2013-10-10 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created KAFKA-1081:
-

 Summary: kafka-run-class.sh is broken
 Key: KAFKA-1081
 URL: https://issues.apache.org/jira/browse/KAFKA-1081
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8
Reporter: Francois Saint-Jacques


Please apply this patch, this is why log4j exists. Rerunning at 
non-deterministic command twice to catch error message is extremely dangerous.

diff --git a/bin/kafka-run-class.sh b/bin/kafka-run-class.sh
index eb6ff1b..2f2d8b5 100755
--- a/bin/kafka-run-class.sh
+++ b/bin/kafka-run-class.sh
@@ -102,19 +102,3 @@ if [ "$1" = "daemon" ] && [ -z "$KAFKA_GC_LOG_OPTS"] ; then
 fi

 $JAVA $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS 
$KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp $CLASSPATH $KAFKA_OPTS "$@"
-
-exitval=$?
-
-if [ $exitval -eq "1" ] ; then
-   $JAVA $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS 
$KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp $CLASSPATH $KAFKA_OPTS "$@" >& 
exception.txt
-   exception=`cat exception.txt`
-   noBuildMessage='Please build the project using sbt. Documentation is 
available at http://kafka.apache.org/'
-   pattern="(Could not find or load main 
class)|(java\.lang\.NoClassDefFoundError)"
-   match=`echo $exception | grep -E "$pattern"`
-   if [[ -n "$match" ]]; then
-   echo $noBuildMessage
-   fi
-   rm exception.txt
-fi
-
-



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1081) kafka-run-class.sh is broken

2013-10-10 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791819#comment-13791819
 ] 

Francois Saint-Jacques commented on KAFKA-1081:
---

Look for the 3 last lines, not the usage error.

$ cd /
$ /opt/kafka/bin/kafka-list-topic.sh
Missing required argument "[zookeeper]"
Option  Description
--  ---
--topic  REQUIRED: The topic to be listed.
  Defaults to all existing topics.
  (default: )
--unavailable-partitionsif set, only show partitions whose
  leader is not available
--under-replicated-partitions   if set, only show under replicated
  partitions
--zookeeper   REQUIRED: The connection string for
  the zookeeper connection in the form
  host:port. Multiple URLS can be
  given to allow fail-over.
/opt/kafka/bin/kafka-run-class.sh: line 72: exception.txt: Permission denied
cat: exception.txt: No such file or directory
rm: cannot remove 'exception.txt': No such file or directory

> kafka-run-class.sh is broken
> 
>
> Key: KAFKA-1081
> URL: https://issues.apache.org/jira/browse/KAFKA-1081
> Project: Kafka
>  Issue Type: Bug
>    Affects Versions: 0.8
>Reporter: Francois Saint-Jacques
>
> Please apply this patch, this is why log4j exists. Rerunning at 
> non-deterministic command twice to catch error message is extremely dangerous.
> diff --git a/bin/kafka-run-class.sh b/bin/kafka-run-class.sh
> index eb6ff1b..2f2d8b5 100755
> --- a/bin/kafka-run-class.sh
> +++ b/bin/kafka-run-class.sh
> @@ -102,19 +102,3 @@ if [ "$1" = "daemon" ] && [ -z "$KAFKA_GC_LOG_OPTS"] ; 
> then
>  fi
>  $JAVA $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS 
> $KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp $CLASSPATH $KAFKA_OPTS "$@"
> -
> -exitval=$?
> -
> -if [ $exitval -eq "1" ] ; then
> -   $JAVA $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS 
> $KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp $CLASSPATH $KAFKA_OPTS "$@" >& 
> exception.txt
> -   exception=`cat exception.txt`
> -   noBuildMessage='Please build the project using sbt. Documentation is 
> available at http://kafka.apache.org/'
> -   pattern="(Could not find or load main 
> class)|(java\.lang\.NoClassDefFoundError)"
> -   match=`echo $exception | grep -E "$pattern"`
> -   if [[ -n "$match" ]]; then
> -   echo $noBuildMessage
> -   fi
> -   rm exception.txt
> -fi
> -
> -



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1081) kafka-run-class.sh is broken

2013-10-10 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791870#comment-13791870
 ] 

Francois Saint-Jacques commented on KAFKA-1081:
---

Look, this is an ugly hack. The real problem here is not the directory 
permission or using a temp file but RERUNNING the first java command. 

I'm not even trying to build it, I'm trying to correctly package kafka on a 
production server. Whenever I run any command (bin/*) that doesn't return 
properly, it borks if kafka-run-class.sh is called within a directory where the 
user don't have write permission.

I understand that these lines are there to help new users who checkout the 
project and forget to build before running any command, but we're talking of 
deploying quality code in production.

> kafka-run-class.sh is broken
> 
>
> Key: KAFKA-1081
> URL: https://issues.apache.org/jira/browse/KAFKA-1081
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Francois Saint-Jacques
>
> Please apply this patch, this is why log4j exists. Rerunning at 
> non-deterministic command twice to catch error message is extremely dangerous.
> diff --git a/bin/kafka-run-class.sh b/bin/kafka-run-class.sh
> index eb6ff1b..2f2d8b5 100755
> --- a/bin/kafka-run-class.sh
> +++ b/bin/kafka-run-class.sh
> @@ -102,19 +102,3 @@ if [ "$1" = "daemon" ] && [ -z "$KAFKA_GC_LOG_OPTS"] ; 
> then
>  fi
>  $JAVA $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS 
> $KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp $CLASSPATH $KAFKA_OPTS "$@"
> -
> -exitval=$?
> -
> -if [ $exitval -eq "1" ] ; then
> -   $JAVA $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS 
> $KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp $CLASSPATH $KAFKA_OPTS "$@" >& 
> exception.txt
> -   exception=`cat exception.txt`
> -   noBuildMessage='Please build the project using sbt. Documentation is 
> available at http://kafka.apache.org/'
> -   pattern="(Could not find or load main 
> class)|(java\.lang\.NoClassDefFoundError)"
> -   match=`echo $exception | grep -E "$pattern"`
> -   if [[ -n "$match" ]]; then
> -   echo $noBuildMessage
> -   fi
> -   rm exception.txt
> -fi
> -
> -



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1081) kafka-run-class.sh is broken

2013-10-10 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791901#comment-13791901
 ] 

Francois Saint-Jacques commented on KAFKA-1081:
---

I know you guys have a binary release, this is what I'm deploying. My point is, 
this specific snippet should never go in production release.

> kafka-run-class.sh is broken
> 
>
> Key: KAFKA-1081
> URL: https://issues.apache.org/jira/browse/KAFKA-1081
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8
>Reporter: Francois Saint-Jacques
>
> Please apply this patch, this is why log4j exists. Rerunning at 
> non-deterministic command twice to catch error message is extremely dangerous.
> diff --git a/bin/kafka-run-class.sh b/bin/kafka-run-class.sh
> index eb6ff1b..2f2d8b5 100755
> --- a/bin/kafka-run-class.sh
> +++ b/bin/kafka-run-class.sh
> @@ -102,19 +102,3 @@ if [ "$1" = "daemon" ] && [ -z "$KAFKA_GC_LOG_OPTS"] ; 
> then
>  fi
>  $JAVA $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS 
> $KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp $CLASSPATH $KAFKA_OPTS "$@"
> -
> -exitval=$?
> -
> -if [ $exitval -eq "1" ] ; then
> -   $JAVA $KAFKA_HEAP_OPTS $KAFKA_JVM_PERFORMANCE_OPTS $KAFKA_GC_LOG_OPTS 
> $KAFKA_JMX_OPTS $KAFKA_LOG4J_OPTS -cp $CLASSPATH $KAFKA_OPTS "$@" >& 
> exception.txt
> -   exception=`cat exception.txt`
> -   noBuildMessage='Please build the project using sbt. Documentation is 
> available at http://kafka.apache.org/'
> -   pattern="(Could not find or load main 
> class)|(java\.lang\.NoClassDefFoundError)"
> -   match=`echo $exception | grep -E "$pattern"`
> -   if [[ -n "$match" ]]; then
> -   echo $noBuildMessage
> -   fi
> -   rm exception.txt
> -fi
> -
> -



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1084) Validate properties for custom serializers

2013-10-11 Thread Francois Saint-Jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated KAFKA-1084:
--

Attachment: validate-external-properties.patch

> Validate properties for custom serializers
> --
>
> Key: KAFKA-1084
> URL: https://issues.apache.org/jira/browse/KAFKA-1084
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Francois Saint-Jacques
>Priority: Minor
> Attachments: validate-external-properties.patch
>
>
> We use specifics encoder/decoder for our producers/consumers, they get 
> correctly initialized by the Producer/Consumer. The only downside is the 
> validate() function of VerifiableProperties that pollutes our log stream.
> This patch allows custom serializers keys to validate correctly if they begin 
> with the "external" prefix, for example:
> external.my.encoder.param=true
> will not raise a WARN.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (KAFKA-1084) Validate properties for custom serializers

2013-10-11 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created KAFKA-1084:
-

 Summary: Validate properties for custom serializers
 Key: KAFKA-1084
 URL: https://issues.apache.org/jira/browse/KAFKA-1084
 Project: Kafka
  Issue Type: Improvement
Reporter: Francois Saint-Jacques
Priority: Minor
 Attachments: validate-external-properties.patch

We use specifics encoder/decoder for our producers/consumers, they get 
correctly initialized by the Producer/Consumer. The only downside is the 
validate() function of VerifiableProperties that pollutes our log stream.

This patch allows custom serializers keys to validate correctly if they begin 
with the "external" prefix, for example:

external.my.encoder.param=true

will not raise a WARN.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1084) Validate properties for custom serializers

2013-10-15 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795202#comment-13795202
 ] 

Francois Saint-Jacques commented on KAFKA-1084:
---

I believe a warning will still get logged for unknown string properties since 
the patch in KAFKA-1049 doesn't touch the validate function.

> Validate properties for custom serializers
> --
>
> Key: KAFKA-1084
> URL: https://issues.apache.org/jira/browse/KAFKA-1084
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Francois Saint-Jacques
>Priority: Minor
> Attachments: validate-external-properties.patch
>
>
> We use specifics encoder/decoder for our producers/consumers, they get 
> correctly initialized by the Producer/Consumer. The only downside is the 
> validate() function of VerifiableProperties that pollutes our log stream.
> This patch allows custom serializers keys to validate correctly if they begin 
> with the "external" prefix, for example:
> external.my.encoder.param=true
> will not raise a WARN.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (KAFKA-1115) producer performance affected by trace/debug calls

2013-11-01 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created KAFKA-1115:
-

 Summary: producer performance affected by trace/debug calls
 Key: KAFKA-1115
 URL: https://issues.apache.org/jira/browse/KAFKA-1115
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Affects Versions: 0.8
Reporter: Francois Saint-Jacques
Assignee: Francois Saint-Jacques


After investigating high CPU usage on some producers in production, we found 
out that a lot of time was passed in string construction for logging of DEBUG 
and TRACE level.

This patch removes most of the logging calls, on our systems it cuts CPU usage 
down to half of what it used to be.

Note that this is a significant boost in performance for environment where 
there are thousands of msg/s.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1115) producer performance affected by trace/debug calls

2013-11-01 Thread Francois Saint-Jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated KAFKA-1115:
--

Status: Patch Available  (was: Open)

> producer performance affected by trace/debug calls
> --
>
> Key: KAFKA-1115
> URL: https://issues.apache.org/jira/browse/KAFKA-1115
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>
> After investigating high CPU usage on some producers in production, we found 
> out that a lot of time was passed in string construction for logging of DEBUG 
> and TRACE level.
> This patch removes most of the logging calls, on our systems it cuts CPU 
> usage down to half of what it used to be.
> Note that this is a significant boost in performance for environment where 
> there are thousands of msg/s.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1115) producer performance affected by trace/debug calls

2013-11-01 Thread Francois Saint-Jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated KAFKA-1115:
--

Attachment: producer-performance-fix.patch

> producer performance affected by trace/debug calls
> --
>
> Key: KAFKA-1115
> URL: https://issues.apache.org/jira/browse/KAFKA-1115
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
> Attachments: producer-performance-fix.patch
>
>
> After investigating high CPU usage on some producers in production, we found 
> out that a lot of time was passed in string construction for logging of DEBUG 
> and TRACE level.
> This patch removes most of the logging calls, on our systems it cuts CPU 
> usage down to half of what it used to be.
> Note that this is a significant boost in performance for environment where 
> there are thousands of msg/s.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (KAFKA-1115) producer performance affected by trace/debug calls

2013-11-01 Thread Francois Saint-Jacques (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques updated KAFKA-1115:
--

Status: Open  (was: Patch Available)

> producer performance affected by trace/debug calls
> --
>
> Key: KAFKA-1115
> URL: https://issues.apache.org/jira/browse/KAFKA-1115
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
> Attachments: producer-performance-fix.patch
>
>
> After investigating high CPU usage on some producers in production, we found 
> out that a lot of time was passed in string construction for logging of DEBUG 
> and TRACE level.
> This patch removes most of the logging calls, on our systems it cuts CPU 
> usage down to half of what it used to be.
> Note that this is a significant boost in performance for environment where 
> there are thousands of msg/s.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1115) producer performance affected by trace/debug calls

2013-11-01 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811759#comment-13811759
 ] 

Francois Saint-Jacques commented on KAFKA-1115:
---

This is not the problem. Even if I change the log level, the construction of 
the string passed to the logging subsystem is the root cause, i.e. all calls of 
the form "Message %s is... ".format(...) . Scala is not a lazy evaluation 
language, the only ways to fix this problem are:

1. Wrap all the trace/debug calls with if (logging.debugEnabled()...) so that 
it doesn't get evaluated. This is the lazy way.

or

2. Remove any debug/trace calls in this critical code path. This is clearly the 
remnant of `print foo' debugging, it shouldn't be committed to the trunk 
branch. 

> producer performance affected by trace/debug calls
> --
>
> Key: KAFKA-1115
> URL: https://issues.apache.org/jira/browse/KAFKA-1115
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>    Affects Versions: 0.8
>    Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
> Attachments: producer-performance-fix.patch
>
>
> After investigating high CPU usage on some producers in production, we found 
> out that a lot of time was passed in string construction for logging of DEBUG 
> and TRACE level.
> This patch removes most of the logging calls, on our systems it cuts CPU 
> usage down to half of what it used to be.
> Note that this is a significant boost in performance for environment where 
> there are thousands of msg/s.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (KAFKA-1115) producer performance affected by trace/debug calls

2013-11-03 Thread Francois Saint-Jacques (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812550#comment-13812550
 ] 

Francois Saint-Jacques commented on KAFKA-1115:
---

You are right, and by looking at the code the trace/debug logging functions are 
defined with lazy arguments.

By debugging a bit further, I did not have any log4.properties in the producer 
classpath. The library does give me a WARN about missing log4.properties file. 
After this warning message, the program does not output any more log. 
Internally I'd assume the logging still we're set to TRACE without any appender 
to stdout/stderr.

I'll take WARN more seriously next time. Maybe a put message in the producer 
documentation that not providing log4j configuration will have a serious impact 
on performance.

On another note, I believe trace/debug message shouldn't be committed to 
production code, but this is only a personal hunch.

> producer performance affected by trace/debug calls
> --
>
> Key: KAFKA-1115
> URL: https://issues.apache.org/jira/browse/KAFKA-1115
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
> Attachments: producer-performance-fix.patch
>
>
> After investigating high CPU usage on some producers in production, we found 
> out that a lot of time was passed in string construction for logging of DEBUG 
> and TRACE level.
> This patch removes most of the logging calls, on our systems it cuts CPU 
> usage down to half of what it used to be.
> Note that this is a significant boost in performance for environment where 
> there are thousands of msg/s.



--
This message was sent by Atlassian JIRA
(v6.1#6144)