Enrico,
I would suggest you applied my fixes and then debug from there. In this
way, you will have a better sense where the first corruption is from.
Sijie
On Fri, Mar 9, 2018 at 11:48 AM Enrico Olivelli wrote:
> Il ven 9 mar 2018, 19:30 Enrico Olivelli ha scritto:
>
> > Thank you Ivan!
> > I
Il ven 9 mar 2018, 19:30 Enrico Olivelli ha scritto:
> Thank you Ivan!
> I hope I did not mess up the dump and added ZK ports. We are not using
> standard ports and in that 3 machines there is also the 3 nodes zk
> ensemble which is supporting BK and all the other parts of the application
>
> S
Thank you Ivan!
I hope I did not mess up the dump and added ZK ports. We are not using
standard ports and in that 3 machines there is also the 3 nodes zk
ensemble which is supporting BK and all the other parts of the application
So one explanation would be that something is connecting to the boo
On Fri, Mar 9, 2018 at 3:20 PM, Enrico Olivelli wrote:
> Bookies
> 10.168.10.117:1822 -> bad bookie with 4.1.21
> 10.168.10.116:1822 -> bookie with 4.1.12
> 10.168.10.118:1281 -> bookie with 4.1.12
>
> 10.168.10.117 client machine on which I have 4.1.21 client (different
> process than the bookie
Bookies
10.168.10.117:1822 -> bad bookie with 4.1.21
10.168.10.116:1822 -> bookie with 4.1.12
10.168.10.118:1281 -> bookie with 4.1.12
10.168.10.117 client machine on which I have 4.1.21 client (different
process than the bookie one)
Thanks
Enrico
2018-03-09 15:16 GMT+01:00 Ivan Kelly :
> On
Also, do you have the logs of the error occurring on the server side?
-Ivan
On Fri, Mar 9, 2018 at 3:16 PM, Ivan Kelly wrote:
> On Fri, Mar 9, 2018 at 3:13 PM, Enrico Olivelli wrote:
>> New dump,
>> sequence (simpler)
>>
>> 1) system is running, reader is reading without errors with netty 4.1.2
On Fri, Mar 9, 2018 at 3:13 PM, Enrico Olivelli wrote:
> New dump,
> sequence (simpler)
>
> 1) system is running, reader is reading without errors with netty 4.1.21
> 2) 3 bookies, one is with 4.1.21 and the other ones with 4.1.12
> 3) kill one bookie with 4.1.12, the reader starts reading from th
New dump,
sequence (simpler)
1) system is running, reader is reading without errors with netty 4.1.21
2) 3 bookies, one is with 4.1.21 and the other ones with 4.1.12
3) kill one bookie with 4.1.12, the reader starts reading from the bookie
with 4.1.21
4) client messes up, unrecoverably
Enrico
2
I've asked enrico to run again, as this dump doesn't span the time
when the issue started occurring.
What I'm looking for is to be able to inspect the first packet which
triggers the version downgrade of the decoders.
On Fri, Mar 9, 2018 at 3:04 PM, Enrico Olivelli wrote:
> This is the dump
>
>
Il ven 9 mar 2018, 14:12 Ivan Kelly ha scritto:
> > Any suggestion on the tcpdump config ? (command line example)
>
> sudo tcpdump -s 200 -w blah.pcap 'tcp port 3181'
>
> Where are you going to change the netty? client or server or both?
>
Both, as the application is packaged as a single bundle.
> Any suggestion on the tcpdump config ? (command line example)
sudo tcpdump -s 200 -w blah.pcap 'tcp port 3181'
Where are you going to change the netty? client or server or both?
-Ivan
2018-03-09 13:48 GMT+01:00 Ivan Kelly :
> Great analysis Sijie.
>
> Enrico, are these high traffic machines? Would it be feasible to put
> tcpdump running? You could even truncate each message to 100 bytes or
> so, to avoid storing payloads. It'd be very useful to see what the
> corrupt traffic ac
Great analysis Sijie.
Enrico, are these high traffic machines? Would it be feasible to put
tcpdump running? You could even truncate each message to 100 bytes or
so, to avoid storing payloads. It'd be very useful to see what the
corrupt traffic actually looks like.
-Ivan
On Fri, Mar 9, 2018 at 10
> The "predicate" approach is problematic, it can potentially cause some
> ledgers never being replicated. Ideally, this is something should be done
> by auditor, because auditor
> knows the ledgers, the alive bookies and the network topology, auditor
> should be able to compute a replication plan
Reverted to Netty 4.1.12. System is "more" stable but after "some" restart
we still have errors on client side on tailing readers, rebooting the JMV
"resolved" temporary the problem.
I have no more errors on the Bookie side
My idea:
- client is reading from 2 bookies, there is some bug in this a
2018-03-09 8:59 GMT+01:00 Sijie Guo :
> Sent out a PR for the issues that I observed:
>
> https://github.com/apache/bookkeeper/pull/1240
>
Other findings:
- my problem is not related to jdk9, it happens with jdk8 too
- the "tailing reader" is able to make progress and follow the WAL, so not
all
On Thu, Mar 8, 2018 at 11:46 AM, Venkateswara Rao Jujjuri wrote:
> On Thu, Mar 8, 2018 at 11:33 AM, Sijie Guo wrote:
>
> > On Thu, Mar 8, 2018 at 8:07 AM, Venkateswara Rao Jujjuri <
> > jujj...@gmail.com>
> > wrote:
> >
> > > On Thu, Mar 8, 2018 at 2:38 AM, Ivan Kelly wrote:
> > >
> > > > > Giv
Sent out a PR for the issues that I observed:
https://github.com/apache/bookkeeper/pull/1240
On Thu, Mar 8, 2018 at 10:47 PM, Sijie Guo wrote:
> So the problem here is:
>
> - a corrupted request failed the V3 request decoder, so bookie switched to
> use v2 request decoder. Once the switch happe
18 matches
Mail list logo