Re: [Twisted-Python] Is there pb documentation somewhere?
On Aug 4, 2014, at 9:47 PM, Daniel Sank wrote: > glyph, > > >> 2. Is there a specification for the pb dialect of banana? > >> > > Beyond the code, no. > > Ok. > > > I would be happy to answer questions, but obviously I'm not super > > responsive :). > > Let me know what you need. > > For two personal projects, I would like to have a reasonable remote objects > library in python. I need something which can announce state changes to > clients, and receive state change requests from clients. My solution: > > 1. Make server side class which can spawn Cacheables when it wants to tell > clients of its existence. > 2. Give RemoteCaches to clients and use observe_* methods as appropriate. > 3. Stuff a Viewable into the RemoteCaches so that clients can request state > changes. > > Question #1: Is this a reasonable use of pb? Yes. > This all worked great until I ran into a bug. In trying to fix the bug, I > found that > > 1. pb code is really hard to understand Sorry about that. > 2. exarkun's thinks pb is bad and that I should implement what I need in AMP. I really wish we would stop calling things "bad" and "good". This isn't a helpful classification. PB is adequate for a particular set of requirements. Those requirements are somewhat unusual, and AMP is better for a lot of use-cases. It sounds to me like you are a lot more interested in > 3. exarkun thinks banana and jelly are reasonable. Again, what does "reasonable" mean in this context? Let me explain my own opinion about this. Banana is a perfectly serviceable low-level marshaling format. It's pretty efficient when compared to something like JSON, and has compression mechanisms which can make it even more efficient (the "dialect" support you referred to). The only thing about it that isn't very general is that its implementation (although not the protocol specification) hard-codes the PB abbreviated-string dialect. Jelly is higher level, but more language-specific. Its specification implicitly encodes numerous Python implementation details, like the distinction between "tuple" and "list". It also couples very tightly to your program's structure. This can be a real benefit to getting a protocol up and running quickly, but it still allows you to create protocols where you don't really know what the wire format is, where you develop hidden dependencies. In more complex protocols (where the "ease of getting up and running quickly" thing really starts to shine) this attribute of Jelly can cause real difficulty in any kind of cross-system communication: communicating with a peer from a different language, or even in Python without access to all the protocol class definitions from the original system, is hard because it requires reverse-engineering. This is where it becomes "bad". Still, it isn't as big of a disaster security- and maintenance-wise as Pickle. The information you need is recorded in the code, it's just spread out, you don't need to work backwards from protocol dumps. If I were going to spend some time maintaining PB, this is where I'd focus: if the schemas were a bit more explicit, could be collected into one place more easily, and were all validated in advance (before passing deserialized objects to the application code, or serializing them across the wire), then these problems could be addressed without changing the API too much. PB basically just inherits all of the benefits and caveats of Jelly. It's a trivial serialization of remote references to objects. > Question #2: Would you recommend implementing a simplified replacement for pb > on top of banana/jelly, or starting over from AMP? I favor the banana/jelly > route because the protocol seems intrinsically flexible, but I read your blog > explaining why protocols like banana are bad, so I'm confused about what I > "should" do. First of all, don't take my development advice as gospel. When I write an article and publish it, I'm just trying to make people aware of issues they may not have considered; make your own decisions about how to write your own code. (Unless your decision is to write it yourself in PHP, of course, in which case you are a danger to yourself and others and should be remanded to compulsory treatment.) It seems like PB fits your style, and the problems with it are all tractable and fixable. I am sad that you're not getting the development support you need to maintain it (most of all I'm sad you're not getting it from me!) but let's see if we can fix that. I'll start by replying to your other email. One thing that might speed things along is if you can help out with some code reviews. We've got a _really_ long queue right now and that's making it hard for me to spend any focused effort in one particular area. I'm happy to trade 2-for-1 - if you do two code reviews, I will regard it as an immediate obligation for me to review a ticket you direct me to ;). It might also help to write m
Re: [Twisted-Python] Is there pb documentation somewhere?
On Aug 4, 2014, at 10:07 PM, Daniel Sank wrote: > glyph > > > I would be happy to answer questions, but obviously I'm not super > > responsive :). > > Let me know what you need. > > I am trying to understand jelly's serialization strategy: > > 1. In t.s.jelly._Jellier, what is the meaning of persistentStore? From the perspective of PB, you can ignore this completely. It's effectively an unused feature. There are two entry-point call-sites for jelly in Pb. Broker.unserialize and Broker.serialize. Both explicitly pass "None" for the "persistent" argument, "persistentStore" and "persistentLoad" respectively. Reaching back into my dim and distant memory of the ancient past, I believe that the purpose of these callables was to allow you to use Jelly (and perhaps PB) to refer to objects in some kind of pluggable long-term storage. The reason they're called "persistent" was that "ephemeral" storage was local to the connection, and therefore short-lived enough that we could trust that an in-memory Python dictionary would be both large enough and long-lived enough to serve it. But if you have your objects in a database, you might want a different database backend with an application-provided callable for loading objects by ID. Again, this was never really used, so you can probably ignore it. (I think there might have been a 4X massively multiplayer video game which used it in 2002 or so, but nothing since then that I'm aware of, especially since PB doesn't even have a way to pass in your own without subclassing and overriding 'serialize'.) > 2. In t.s.jelly._Jellier, what is the meaning of cooked? The comment here > doesn't make sense to me yet. I just read the comment in _cook, and I hate my younger self right now. Seriously. Screw that guy. When you make a jelly, you have to cook the fruit first. So part of the metaphor here is that you are "cooking" the objects as you're serialize them. The "cooked" attribute maps object IDs (integers representing pointers, at least in CPython) to "dereference" jelly expressions. It is said to be "cooked" at that point because you no longer need to put in the energy (I guess heat, in this metaphor?) to serialize the internal state. A "dereference" expression is one that points at an object within the same Jelly, so this is not like something pointing at a remote reference. It uses object IDs for keys and not the objects themselves because these objects are (since they can participate in circular references) implicitly mutable, and mutable objects often don't have a working __hash__ implementation, so we can't rely on that. This happens in a weird order because an object may circularly refer to itself, so we prepare it and put it in the "preserved" map before actually beginning the serialization process of its initial state. We also don't want to pollute the jelly output with reference IDs for every single object that _might_ be referenced more than once, we only want to add the ['reference'] expression if we actually refer to it twice. If you look at this example: >>> from twisted.spread.jelly import jelly >>> circular = [1, 2] >>> circular.append(circular) >>> jelly(circular) ['reference', 1, ['list', 1, 2, ['dereference', 1]]] >>> acyclic = [1, 2] >>> jelly(acyclic) ['list', 1, 2] You can see that the circular list allocates a reference ID '1' for the circular list. The output list there would have been the thing that went into the _Jellier's "cooked" list, keyed by the 'id' for the serialized list, and then 'reference 1' would have been inserted into the beginning and its body appended. So the steps are: Here's a mutable object. Let me remember that I've seen it, just in case I see it again. Now I'm going to recursively serialize it. Oh, here it is again, I know it's the same object because it has the same ID. Instead of serializing it, I'll change the ['list'] into a ['reference', 1] and stick in a ['dereference', 1] here. If we never get to step 3, we never see the ['reference'] at all, and it's as if this functionality didn't exist. > 3. In t.s.jelly._Jellier, what is the meaning of cooker? The "cooker" attribute is a hack related to the use of "id" for the unique IDs. If we used the object itself as the key (which we shouldn't do, for reasons I mentioned above), then we could just rely on it sticking around until the end of the 'jelly' call. But instead, we use its 'id', which is its pointer address, so we need to make sure that it lives on until the end of the _Jellier's lifetime, so we just stick it into the "cooker" map as the value. You'll notice that there's no store of the object itself anywhere else: in "cooked" the key is the ID, and the value is the serialized output value that Jelly is going to write out. If we didn't make sure the object stuck around, a different object might get the same ID, and that would produce spurious back-references (like, we might get a ['derefere