Slack digest for #dev - 2020-02-27

Apache Pulsar Slack Thu, 27 Feb 2020 01:11:28 -0800

2020-02-26 09:13:03 UTC - Eugen: Another way to think about this would be to 
consider it a synchronous topic compaction with only one key ( 
<https://github.com/streamnative/pulsar/issues/650> )
----
2020-02-26 09:15:07 UTC - Eugen: But as it's only for one key, (I hope) it 
would be able to do without the  overhead of a compaction topic.
----
2020-02-26 16:13:10 UTC - Alexandre DUVAL: Hi all,

About <https://github.com/apache/pulsar/pull/6428>, in AuthorizationProvider,
some of methods return CompletableFuture&lt;Void&gt; and other
CompletableFuture&lt;Boolean&gt;, I personally prefer the last one (makes more
sense).
Because for example:
```public CompletableFuture&lt;Void&gt; grantPermissionAsync(TopicName
topicname, Set&lt;AuthAction&gt; actions, String role, String authDataJson)```
can't be joined with:
```public CompletableFuture&lt;Boolean&gt; canManageTopic(TopicName topicName,
TopicOperation operation, String role, AuthenticationDataSource authData)
// with TopicOperation.grantPermissions```
The main problem is about breaking changes, so must I have both
CompletableFuture&lt;Boolean&gt; and CompletableFuture&lt;Void&gt; impl?
WDYT?
----
2020-02-26 16:21:33 UTC - Alexandre DUVAL: I would like to use canManageTopic
&amp; canManageNamespace in already existing methods in AuthorizationProvider,
but maybe I should just use old and new grantPermissionAsync &amp;
canManageTopic where grantPermissionAsync is used and depreciate it?
----
2020-02-26 16:43:00 UTC - Addison Higham: :disappointed: :disappointed: another
case of one of my ZK nodes not being consistent with other 2... just missing a
few keys that the other 2 have
----
2020-02-26 16:55:35 UTC - chris: If you provide a default implementation in the
interface is it still a breaking change? Also, the naming of function
authorization method makes more sense to me. These names would make more sense
to me allowTopicOperation allowNamespaceOperation. To start using it could have
a default implementation of allowOperation map to the old method in the
interface. Then you should be able to replaces instances of the old methods
with the new ones.
----
2020-02-26 16:56:13 UTC - chris: and this could keep working as before
----
2020-02-26 17:00:47 UTC - Alexandre DUVAL: good ideas!
----
2020-02-26 17:26:59 UTC - John Duffie: @John Duffie has joined the channel
----
2020-02-26 19:11:45 UTC - Joe Francis: The general solution to this would be a
db with change notifications. You can simulate one. Use a state topic and an
update topic. Run a function that keeps updating the state topic from the
notification topic. Anytime the state gets updated, the function acks the
previous message in the state, topic so the last message will be retained till
the next state is published. And after updating the state , the function acks
the update message. So the last message in the state topic plus unacked
messages in the update topic is the current state.
+1 : Devin G. Bost
----
2020-02-26 20:05:34 UTC - Eugen: Can't this be handled in the broker, rather
than requiring the user to create another topic and a function? After all, we
already have
1. `MessageId.earliest`: getting all existing messages + all new messages
2. `MessageId.latest`: getting only new messages
So what I'm trying to do is adding this case:
3. `MessageId.latestInclusive`: getting only the latest existing message + all
new messages
So why can't the broker do what the consumer can, namely doing a
`getLatestMessageId()` and have the reader receive all messages from that id?
In other words, if 1. works by passing msgId.earliest, and there is a way to
get the latest message id, why would it be so hard to make
msgId.latestInclusive work in the same way?
----
2020-02-26 20:28:00 UTC - Ravi Shah: How to pass pulsar backlog message metric
to HPA to scale Kubernetes pods?
----
2020-02-26 20:29:09 UTC - Ravi Shah: Is there any pulsar Prometheus adapter for
custom metric which i can pass to HPA?
----
2020-02-26 21:09:18 UTC - Luke Lu: @Luke Lu has joined the channel
----
2020-02-26 22:17:35 UTC - Joe Francis: That is a good question, and I can
provide my opinion, which is a bit philosophical. It comes down to how systems
ought to be designed. Databases and message queues are entirely different
systems, and designed to do very different work efficiently. The tradeoffs made
when implementing such systems, on more or less the same physical resources,
are totally different. One is designed for random writes/reads and search, the
other is designed for sequential reads and writes, and WORM

I run Pulsar at a very large scale (Millions of topics) I also run a very large
distributed db (PetaB), . Day in day out, I have users who want to use my db as
a Q and the Q(Pulsar) as a db. Obviously both are possible. But both cannot be
done efficiently at scale . Pulsar is the way it is because when it was
designed, it was decided that it would not attempt to do certain things, like
transactions, scheduled delivery, nack, compaction etc, which all are flow
killers. And that was not without reason, because all the people involved in
building Pulsar had attempted to build something similar with AMQ as the base,
and understood the problems introduced by attempting to be everything to
everyone. Pulsar scales and is super efficient at what it does, because of
what it chose not to do.

Functions are a nice abstraction, similar to the dispatch vs storage
abstraction. Although, that abstraction is more observed in the breach now.
----
2020-02-26 22:27:55 UTC - Eugen: I see your general point about db vs queue.
But you seem to be saying that what I'm requesting cannot be made efficient or
scalable, but would `latestInclusive` be any less efficient or scalable than
the existing `earliest`?
----
2020-02-26 22:59:40 UTC - Joe Francis: That assumes that I am in favor of
earliest :slightly_smiling_face: But assuming I do, the earliest is free;
lastestinclusive requires work in the dispatch path
----
2020-02-26 23:11:11 UTC - Eugen: Thanks joef for your opinion, I appreciate it!
It's true, we are trying to do something that is different from the core
functionality of pulsar. Form a user's perspective, our problem is imo much
more a streaming (but not queue) one than it is a database one (databases,
which return the latest value for something, but in general do not stream real
time events). One of these days I hope to find the time to look into this and
see how much of a problem it is in terms of implementation and efficiency. But
regardless of anyone being in favor of features like earliest, they are very
handy and useful to some, and everyone else can just ignore them and won't be
impacted...
----
2020-02-26 23:35:56 UTC - Eugen: Fwiw, one of pulsar's core features is i/o
separation, so that historical data can be read without impacting throughput /
latency real-time streams
----
2020-02-27 00:56:30 UTC - Alexandre DUVAL: Why subscription argument exists on
canConsume in authz?
----
2020-02-27 01:02:01 UTC - Eric Simon: Can someone explain to me why changes are
being released into the 2.5.0 after the release? This is incredibly frustrating.
----
2020-02-27 01:48:06 UTC - Devin G. Bost: FYI @Penghui Li
----
2020-02-27 04:20:22 UTC - Sijie Guo: @Addison Higham: it is probably the
zookeeper is lagging behind.
----
2020-02-27 04:21:02 UTC - Sijie Guo: 2.5.0 is a tag.
----
2020-02-27 04:21:15 UTC - Sijie Guo: I don’t think there is new changes are
release to 2.5.0
----
2020-02-27 04:21:25 UTC - Sijie Guo: there are changes released to branch-2.5
----
2020-02-27 04:21:37 UTC - Addison Higham: this was just the config store that
only has a few thousand nodes :confused:
----
2020-02-27 04:21:38 UTC - Sijie Guo: the changes are used for cutting 2.5.1
----
2020-02-27 04:21:47 UTC - Addison Higham: but I have it on my list and go back
through metrics
----
2020-02-27 04:21:54 UTC - Sijie Guo: Can you explain what are the problems you
see?
+1 : Devin G. Bost
----
2020-02-27 04:22:29 UTC - Sijie Guo: a few thousand nodes or a few thousands
znodes?
----
2020-02-27 04:24:05 UTC - Addison Higham: znodes :slightly_smiling_face:
----
2020-02-27 04:26:28 UTC - Addison Higham: ~6k znodes, a whole snapshot
(including ephemeral data), the data dir of the master shows it is only about
~4mb of data (it is just the config store so relatively low update rate as
well...). After I nuked the storage and did a resync, it took like 10 seconds
----
2020-02-27 04:26:33 UTC - Addison Higham: to resync
----
2020-02-27 04:27:35 UTC - Addison Higham: and the state persisted for a few
hours. My best guess is that since I run this on k8s that perhaps there is
something about ip addresses changing rapidly that can cause issues
----
2020-02-27 04:29:38 UTC - Addison Higham: sadly I didn't have time to
investigate more :confused:
----
2020-02-27 04:31:24 UTC - Sijie Guo: @Addison Higham: there is no real
guarantee about when an update is propageted to all participants / followers.
so I would suggest no using global configuration store. instead, just write a
task to do a multi-write to create namespace in the clusters that this
namespace is assigned to.
----
2020-02-27 04:31:36 UTC - Sijie Guo: but it requires additional work.
----
2020-02-27 04:32:38 UTC - Addison Higham: in this case, the ZKs are actually
all in the same region (we had run a proper global ZK in our beta env and for
now just decided to have observers in all other regions)
----
2020-02-27 04:33:52 UTC - Sijie Guo: observers are usually the problems
:slightly_smiling_face:
----
2020-02-27 04:34:08 UTC - Sijie Guo: because they are easily falling behind.
----
2020-02-27 04:34:30 UTC - Addison Higham: the region where we had the issue was
the region where we have the 3 participating members
----
2020-02-27 04:34:59 UTC - Addison Higham: (ATM we have a fairly large
imbalance, with 90% of traffic in one region, so we put the real quorum members
there)
----
2020-02-27 04:36:29 UTC - Addison Higham: anyways, yes, it seems unlikely that
it is solely a ZK bug, I am assuming it is likely something partially about how
we are running things, but I am just surprised by it
----
2020-02-27 04:38:29 UTC - Addison Higham: I mostly just need to dig back
through logs and metrics to figure it out, the interesting bit is that the
metrics report a znode count that matches, but when I went to actually go
inspect, znodes were missing
----
2020-02-27 04:40:16 UTC - Sijie Guo: are your metrics collected on the same
zookeeper node?
----
2020-02-27 04:40:39 UTC - Addison Higham: we collect from all 3 members, and
actually might be wrong, had my query screwed up...
----
2020-02-27 04:43:01 UTC - Sijie Guo: I guess you are using the headless service?
----
2020-02-27 04:43:51 UTC - Addison Higham: not quite sure what you mean, we are
just using the pulsar images that attach the prometheus exporter and pulling
them that way and export to datadog
----
2020-02-27 04:45:04 UTC - Addison Higham: znode drop off
----
2020-02-27 04:46:11 UTC - Addison Higham: that is count of znodes, the drop off
correlates when I restarted all three zk nodes, but they all got the storage
re-attached
----
2020-02-27 04:48:46 UTC - Addison Higham: maybe I am just dumb and really
shouldn't be restarting all 3 nodes at once?
----
2020-02-27 04:52:13 UTC - Sijie Guo: all I see. so the znode count drops at all
3 nodes?
----
2020-02-27 05:05:49 UTC - Addison Higham: no, just the one node
----
2020-02-27 05:07:04 UTC - Addison Higham: the line above the drop are the the
other 2 members and it stays consistent with the count before the restart
----
2020-02-27 05:09:41 UTC - Joe Francis: what is the status of the low-znode ZK?
Is it in the quorum?
----
2020-02-27 05:10:15 UTC - Addison Higham: yes and reported itself as a follower
----
2020-02-27 05:13:35 UTC - Joe Francis: That seems strange. Unless somehow its
on-disk snapshot was mucked up. There has been a recent issue with ZK not
shutting down when running out of disk space ...
----
2020-02-27 05:13:54 UTC - Addison Higham: 10GB disks but it is only about 4 MB
of data :confused:
----
2020-02-27 05:17:06 UTC - Devin G. Bost: @Joe Francis I personally experienced
the issue with ZK running out of disk space. That was a big mess, but the
bigger problem was that it didn't recover correctly after the crash:
<https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-1621|https://issues.apache.org/jira/plugins/servlet/mobile#issue/ZOOKEEPER-1621>
There's another open ZK issue that I came across that I think related, but I
don't remember how I found it.
----
2020-02-27 05:17:07 UTC - Addison Higham: so, my best guess at the moment:
• we run on k8s, with AWS CNI, which means the pod gets a unique IP on each boot
• we are using DNS -&gt; IPs for global ZK, so there is lag on reboots to form
quorum as the DNS gets swapped around
• perhaps somehow during this time, as it tries and re-form quorum against
invalid IPs it can just one node but not the other and we end up with some sort
of strange "network partition" where A-&gt;B and B-&gt;C but A can't connect to
C
----
2020-02-27 05:17:15 UTC - Joe Francis:
<https://issues.apache.org/jira/browse/ZOOKEEPER-3701|ZOOKEEPER-3701>
+1 : Devin G. Bost
----
2020-02-27 05:20:14 UTC - Devin G. Bost: @Addison Higham when the ZK disk
filled up and crashed, it happened really fast, so we needed to watch it
carefully to spot it. Good metrics would probably have made it easier to detect
for us.
----
2020-02-27 05:20:53 UTC - Devin G. Bost: 4 MB of data makes me wonder if it
crashed and restarted already.
----
2020-02-27 05:24:29 UTC - Addison Higham: this was during a lull in traffic and
happened as a result of maintenance where we manually restarted the global zk
nodes
----
2020-02-27 05:26:30 UTC - Sijie Guo: @Addison Higham did you restart the node
with small count of znodes?
----
2020-02-27 05:30:34 UTC - Devin G. Bost: @Addison Higham I think we also ran
into odd problems in the past when we've restarted all ZK nodes simultaneously.
----
2020-02-27 05:31:30 UTC - Addison Higham: need to check against some logs to
correlate the exact timing, one sec
----
2020-02-27 05:32:51 UTC - Joe Francis: A ZK cluster will survive a full reboot,
so shutting down all should not matter. I would consult the log of the lower
count ZK and the leader and see what sort of sync happened when it joined the
quorum
----
2020-02-27 05:40:41 UTC - Devin G. Bost: @Joe Francis One time, we had a ZK
cluster where some of our broker information was missing, but I don't think I
checked all the ZK nodes for it.
If I remember correctly, that may have happened after restarting all the ZK
nodes, but I don't remember for certain.
----
2020-02-27 05:45:42 UTC - Devin G. Bost: I thought there might have been an
association with what @Penghui Li's recent PIP was about.
----
2020-02-27 05:55:27 UTC - Joe Francis: My cluster is on BM, and normally we
operate at about 10M znodes. Our main issues are with the Global ZK, but
nothing wild
----
2020-02-27 05:58:37 UTC - Joe Francis: Having reliable, fast storage for ZK
disks will make life easier.
----
2020-02-27 06:02:47 UTC - Devin G. Bost: We definitely experienced far fewer
issues after we upgraded our ZK disks to fast SSD SAN.
----
2020-02-27 06:05:02 UTC - Devin G. Bost: How do I get the znode count?
----
2020-02-27 06:09:25 UTC - Joe Francis: mntr should show it
----

Slack digest for #dev - 2020-02-27

Reply via email to