On Fri, 5 Aug 2022 at 20:31, <ljwob...@gmail.com> wrote: Hey LJ,
> Disclaimer: I work for Cisco on a bunch of silicon. I'm not intimately > familiar with any of these devices, but I'm familiar with the high level > tradeoffs. There are also exceptions to almost EVERYTHING I'm about to say, > especially once you get into the second- and third-order implementation > details. Your mileage will vary... ;-) I expect it may come to this, my question may be too specific to be answered without violating some NDA. > If you have a model where one core/block does ALL of the processing, you > generally benefit from lower latency, simpler programming, etc. A major > downside is that to do this, all of these cores have to have access to all of > the different memories used to forward said packet. Conversely, if you break > up the processing into stages, you can only connect the FIB lookup memory to > the cores that are going to be doing the FIB lookup, and only connect the > encap memories to the cores/blocks that are doing the encapsulation work. > Those interconnects take up silicon space, which equates to higher cost and > power. While an interesting answer, that is, the statement is, cost of giving access to memory for cores versus having a more complex to program pipeline of cores is a balanced tradeoff, I don't think it applies to my specific question, while may apply to generic questions. We can roughly think of FP having a similar amount of lines as Trio has PPEs, therefore, a similar number of cores need access to memory, and possibly higher number, as more than 1 core in line will need memory access. So the question is more, why a lot of less performant cores, where performance is achieved by making pipeline, compared to fewer performant cores, where individual cores will work on packet to completion, when the former has a similar number of core lines as latter has cores. > Packaging two cores on a single device is beneficial in that you only have > one physical chip to work with instead of two. This often simplifies the > board designers' job, and is often lower power than two separate chips. This > starts to break down as you get to exceptionally large chips as you bump into > the various physical/reticle limitations of how large a chip you can actually > build. With newer packaging technology (2.5D chips, HBM and similar > memories, chiplets down the road, etc) this becomes even more complicated, > but the answer to "why would you put two XYZs on a package?" is that it's > just cheaper and lower power from a system standpoint (and often also from a > pure silicon standpoint...) Thank you for this, this does confirm that benefits aren't perhaps as revolutionary as the presentation of thread proposed, presentation divided Trio evolution to 3 phases, and this multiple trios on package was presented as one of those big evolutions, and perhaps some other division of generations could have been more communicative. > Lots and lots of Smart People Time has gone into different memory designs > that attempt to optimize this problem, and it's a major part of the > intellectual property of various chip designs. I choose to read this as 'where a lot of innovation happens, a lot of mistakes happen'. Hopefully we'll figure out a good answer here soon, as the answers vendors are ending up with are becoming increasingly visible compromises in the field. I suspect a large part of this is that cloudy shops represent, if not disproportionate revenue, disproportionate focus and their networks tend to be a lot more static in config and traffic than access/SP networks. And when you have that quality, you can make increasingly broad assumptions, assumptions which don't play as well in SP networks. -- ++ytti