Hi Pulsar Community, Here are the meeting notes from today's community meeting. Thanks to all who participated! We had 3 detailed conversations, as the length of this email demonstrates.
Disclaimer: If something is misattributed or misrepresented, please send a correction to this list. Source google doc: https://docs.google.com/document/d/19dXkVXeU2q_nHmkG8zURjKnYlvD96TbKf5KjYyASsOE Thanks, Michael 2022/06/09, (8:30 AM PST) - Attendees: - Matteo Merli - Michael Marshall - Mattison Chao - Heesung Sohn - Lari Hotari - Aaron Williams - Enrico Olivelli - Christophe Bornet - Andrey Yegorov - Saumitra Srivastav - Discussions - Matteo: Load Balancer email from Heesung is not a proposal yet. It is a discussion, no design doc yet. Just want to start a discussion to start early. It will touch a lot of Pulsar, so consensus is essential. There are many we can touch. The biggest challenge is that no one really understood how it worked in reality. It works most of the time, but when it doesn’t, it’s not clear why. Let’s document what is there from the code. Many things involved there, including keeping track of the old rates of bundles and all the metadata. Then start with a fresh implementation so that we don’t break the current one. There isn’t an answer, and we don’t necessarily even have all of the questions yet. Lari: there was a proposal on the mailing list related to the load balancer, it seemed valuable. Matteo: we need more drastic changes. There isn’t any one problem. Lari: I agree. That proposal included some problem statements that could be valuable. Matteo: some of the approaches embedded in the current model: it is always reactive (there is no broker to broker coordination, instead the interaction with the client is the only driver), the other one is that try to get a perfect placement (should the model be more solid for rearranging bundles across brokers). Michael: the bundle logic is something interesting to consider. What happens when there are many topics and it takes longer. Matteo: the primary issue is that moving bundles is that it’s not broker to broker. It just gets unloaded and then the manager places it somewhere. The broker should know the target broker for the bundle. Michael: that could be sent to the client to decrease latency during bundle moves. Matteo: a tool to measure time to transfer bundle could be valuable. Another detail: no way to do bulk lookups. Michael: that doesn’t even include redirects. Matteo: we should get rid of redirects completely. It should be on the broker side. Another one, when we added bundles, it was meant to be transparent to clients to ensure design flexibility. The design hasn’t actually changed in almost 10 years, so we could expose the bundles to the client and speed up the lookups. - PIPs - Christophe: PIP-173 : Create a built-in Function implementing the most common basic transformations Matteo: I am not a fan of SMT (Kafka’s term for simple message transfer). The way you express the syntax is convoluted. It does not feel natural. Christophe: is the configuration the problem or the feature itself? Matteo: there are multiple ways of doing that, not a single one. Christophe: what we see from users is that there always seem to be transformations like renaming a field or removing a field, and there is no easy way to do that at the moment. The idea is to find something that is available, easy to configure, and does not require writing code. It’s language agnostic. Matteo: I don’t disagree with that. I disagree with the way to expose that. Yes, you don’t write code, but you’re trying to express logic in a configuration file. It is probably not the best way to express that logic. Enrico: two points. Most of the users need a very simple transformation like drop a field. A line with drop field in a config file is straight forward. Matteo: depends on the case. A more general approach gets more complicated. Maybe you want to filter, route, more complex logic, there are many things you can do. The problem is that expressing those in a config file is a nightmare. If you look at an example of SMT, it is complicated. More complex scenarios make it harder. Christophe: did you see the proposed configuration design? I feel it is better than what SMT offers. We could do something with bash like logic that could be compatible with the JSON configuration. How would you do this? Matteo: a SQL interface is easier for users to get. This is how you do aggregation, call a function like upper case, all of this comes naturally. Enrico: the main point now is probably that Christophe may start a standalone project, but the main point is to give pulsar users the functionality. Matteo: the main point is that a new API has to be supported. Not saying we shouldn’t do this. My point is that we should get a good way. You could do it outside of Pulsar. Enrico: these are functions that we want to give to the community for users to have access to out of the box. Being in Pulsar gives confidence that breaking changes will be caught and/or prevented. Matteo: if you put that there, SMT becomes an API. The question is: is it the right API? Is it the right abstraction? To me, it doesn’t sound like a good one. The problem is how is it accessible to a user and how will users be excited to write the actual logic. Lari: there are many use cases where there are users that have high barriers for coding. This lowers the barrier for users. It’s not a stream processing solution. It’s an extension. Christophe: it’s just meant to be for simple transformations. Schemas have a steep learning curve. Matteo: I understand the no-code thing. However, a config file is not the place to express logic. A DSL could capture this. A custom one would be tough, but one like SQL could be easy to adopt. (There was some discussion about complexity and how SQL gives more selection, see recording for more info). Christophe: how would you do routing? Matteo: that is something you can do if you look at Flink SQL, KSQLDB, SnowFlake SQL. They are all slightly different, but the basics are the same. My point is that SQL gives filter, routing, dropping fields, etc. Lari: in this case, it’s more like the model is that there are multiple steps that are doing transformations. There can be these components that do these transformations, and then you add these steps to a list (the config). Each step gets configuration. It’s not meant for all use cases. A good example is a topic with messages in Avro format, and you want them on another topic with JSON format. Adding a function that does that is helpful. Matteo: chaining multiple things still adds complexity. Now you have 5 different config files to inspect to see how they apply to each other. Lari: these models do have limitations, but if it is defined and documented that it is for simple message transformations, to me that makes a lot of sense. I would find it problematic for adding SQL because adding tooling support would be much harder. I see this integrating with a web UI. In that case, it’s better that there isn’t a language that needs to be parsed arbitrarily. Matteo: a SQL dialect is a very specific language to parse. Transformations are one small part. Routing, grouping, and others are features that would be valuable too. A Web UI may be helpful, but the moment you want something serious, you need the underlying code model. We saw this in splunk where there was a DSL and then all the UI to compose a pipeline that gets translated into flink jobs. Ultimately, only the demos were done in the UI. Anything serious needed to use the underlying DSL. Lari: I agree that typically happens. That’s why it’s not meant as a generic stream processing solution. Matteo: it’s not black and white. We say Pulsar functions are not a stream processing framework, but we want to give users the ability to process data easily without managing their own consumers. There are multiple shades in between and there are concepts that are overlapping. Flink, for example, is very powerful and complicated. The whole point is that we don’t need to re-invent a stream processing framework. But we can expose a lot of the constructs that will work with the Pulsar Function model. Lari: in most cases, you do have to actually write code, anyway. Matteo: my only worry is that if this becomes the official API, is this the preferred model that we tell users to define basic functions? I would be resistant to that. Lari: a similar critique could be for the Pulsar function API, too. There isn’t a completely optimal solution. This would solve the very specific entry level solution, but wouldn’t be for all functions. Matteo: my point is that we could have a more general solution that does the transformations and also more complicated logic too. Lari: the risk is that it doesn’t do any one thing very well. Matteo: I disagree, that is implementation dependent. On a SQL like DSL, we can express all of these logics and implement them in the function runtime. It won’t be like stream process, which would need the right run-time, which functions are not. It would let you do most of the things you could do in functions without writing code. Even if we call it no-code, it’s still code, it’s logic. Lari: I agree that something like that would be useful. How realistic is that that we would have that in any short term? Christophe already has it implemented this way. Matteo: let’s talk about that in a couple weeks or less. - Michael: on the release plan PIP-175, we need to define what will force an RC. Matteo: I agree, and we need to set a date and hit our dates. We should cut a branch 3 weeks before. We have different kinds of bugs. The first is a new regression. Those must be fixed. The other one is a bug in a new, shiny feature that has a bug. Should we fix it or not? In order to keep that date, we have 3 weeks to discover and fix the bug. If we can’t fix it in that time, the feature is broken and should be communicated to users. If we find it at the last moment, we ship with the bug (and still tell users). Michael: that could simplify our patch releases too, which can take a while to get out. Matteo: we should formalize when we do patch releases, as well. As Dave mentioned, we have the cherry-picking process that needs documenting. If you merge something and tag it to be cherry-picked, you need to cherry-pick when it is merged (immediately). Otherwise, the release manager will skip that change. Regardless of whether you fixed a bug. The release manager has a lot of work there. Also, when do we want to do patch releases? How many bugs do we wait for, is there a time limit? We need guidelines. I don’t know of any good model. We should have something there. Michael: what about performance regressions? Can those be release blockers? Matteo: There are cases. A simple bug that is easy to fix can definitely trigger a rebuild of the RC. Another is from a large refactor, but if it is from a large refactor, it might take more than 3 weeks. Or it could be a new feature and we want to maintain the new feature. We should run these performance testing at the code freeze time. If 3 weeks is not enough, we should move forward with the release. Matteo’s quick cut is 30%, but doesn’t want a fixed percentage. Different features have different requirements.