Taylor Simpson <tsimp...@quicinc.com> writes:
> I had discussions with several people at the KVM Forum, and I’ve been > thinking about how to divide up the code for community review. Here is my > proposal for the steps. > > 1. linux-user changes + linux-user/hexagon + skeleton of target/hexagon > This is the minimum amount to build and run a very simple program. I > have an assembly program that prints “Hello” and exits. It is > constructed to use very few instructions that can be added brute > force in the Hexagon back end. I'm hoping most of the linux-user changes are in the hexagon runloop? There has been quite a bit of work splitting up and cleaning up the #ifdef mess in linux-user over the last few years. > 2. Add the code that is imported from the Hexagon simulator and the qemu > helper generator > This will allow the scalar ISA to be executed. This will grow the set > of programs that could execute, but there will still be limitations. > In particular, there can be no packets which means the C library won’t > work . We have to build with -nostdlib You could run -nostdlib system TCG tests (hello and memory) but that would require modelling some sort of hardware and assumes you have a simple serial port or semihosting solution. That said a bunch of the MIPS tests are linux-user and -nostdlib so that isn't a major problem in getting some of the tests running. When you say code imported from the hexagon simulator I was under the impression you were generating code from the instruction description. Otherwise you'll need to be very clear about your licensing grants. > 3. Add support for packet semantics > At this point, we will be able to execute full programs linked with > the C library. This will include the check-tcg tests. I think the interesting question is if the roll-back semantics of the hexagon are something we might need for other emulated architectures or is a particularly specific solution for Hexagon (I'm guessing the later). > 4. Add support for the wide vector extensions > 5. Add the helper overrides for performance optimization > Some of these will be written by hand, and we’ll work with rev.ng to > integrate their flex/bison generator. One thing to nail down will be will we include the generated code in the source tree with a tool to regenerate (much like we do for linux-headers) or if we want to add the dependency and regenerate each time from scratch. I don't see including flex/bison as a dependency being a major issue (in fact we have it in our docker images so I guess something uses it). However it might be trickier depending on libclang which was also being discussed. > > I would love some feedback on this proposal. Hopefully, that is enough > detail so that people can comment. If anything isn’t clear, please ask > questions. > > > Thanks, > Taylor > > > From: Qemu-devel <qemu-devel-bounces+tsimpson=quicinc....@nongnu.org> On > Behalf Of Taylor Simpson > Sent: Tuesday, November 5, 2019 10:33 AM > To: Aleksandar Markovic <aleksandar.m.m...@gmail.com> > Cc: Alessandro Di Federico <a...@rev.ng>; ni...@rev.ng; > qemu-devel@nongnu.org; Niccolò Izzo <izzonicc...@gmail.com> > Subject: RE: QEMU for Qualcomm Hexagon - KVM Forum talk and code available > > Hi Aleksandar, > > Thank you – We’re glad you enjoyed the talk. > > One point of clarification on SIMD in Hexagon. What we refer to as the > “scalar” core does have some SIMD operations. Register pairs are 8 bytes, > and there are several SIMD instructions. The example we showed in the talk > included a VADDH instruction. It treats the register pair as 4 half-words > and does a vector add. Then there are the Hexagon Vector eXtensions (HVX) > instructions that operate on 128-byte vectors. There is a wide variety of > instructions in this set. As you mentioned, some of them are pure SIMD and > others are very complex. > > For the helper generator, the vast majority of these are implemented with > helpers. There are only 2 vector instructions in the scalar core that have a > TCG override, and all of the HVX instructions are implemented with helpers. > If you are interested in a deeper dive, see below. > > Alessandro and Niccolo can comment on the flex/bison implementation. > > Thanks, > Taylor > > > Now for the deeper dive in case anyone is interested. Look at the genptr.c > file in target/hexagon. > > The first vector instruction that is with an override is A6_vminub_RdP. It > does a byte-wise comparison of two register pairs and sets a predicate > register indicating whether the byte in the left or right operand is greater. > Here is the TCG code. > #define fWRAP_A6_vminub_RdP(GENHLPR, SHORTCODE) \ > { \ > TCGv BYTE = tcg_temp_new(); \ > TCGv left = tcg_temp_new(); \ > TCGv right = tcg_temp_new(); \ > TCGv tmp = tcg_temp_new(); \ > int i; \ > tcg_gen_movi_tl(PeV, 0); \ > tcg_gen_movi_i64(RddV, 0); \ > for (i = 0; i < 8; i++) { \ > fGETUBYTE(i, RttV); \ > tcg_gen_mov_tl(left, BYTE); \ > fGETUBYTE(i, RssV); \ > tcg_gen_mov_tl(right, BYTE); \ > tcg_gen_setcond_tl(TCG_COND_GT, tmp, left, right); \ > fSETBIT(i, PeV, tmp); \ > fMIN(tmp, left, right); \ > fSETBYTE(i, RddV, tmp); \ > } \ > tcg_temp_free(BYTE); \ > tcg_temp_free(left); \ > tcg_temp_free(right); \ > tcg_temp_free(tmp); \ > } > > The second instruction is S2_vsplatrb. It takes the byte from the operand > and replicates it 4 times into the destination register. Here is the TCG > code. > #define fWRAP_S2_vsplatrb(GENHLPR, SHORTCODE) \ > { \ > TCGv tmp = tcg_temp_new(); \ > int i; \ > tcg_gen_movi_tl(RdV, 0); \ > tcg_gen_andi_tl(tmp, RsV, 0xff); \ > for (i = 0; i < 4; i++) { \ > tcg_gen_shli_tl(RdV, RdV, 8); \ > tcg_gen_or_tl(RdV, RdV, tmp); \ > } \ > tcg_temp_free(tmp); \ > } > > > From: Aleksandar Markovic > <aleksandar.m.m...@gmail.com<mailto:aleksandar.m.m...@gmail.com>> > Sent: Monday, November 4, 2019 6:05 PM > To: Taylor Simpson <tsimp...@quicinc.com<mailto:tsimp...@quicinc.com>> > Cc: qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>; Alessandro Di > Federico <a...@rev.ng<mailto:a...@rev.ng>>; > ni...@rev.ng<mailto:ni...@rev.ng>; Niccolò Izzo > <izzonicc...@gmail.com<mailto:izzonicc...@gmail.com>> > Subject: Re: QEMU for Qualcomm Hexagon - KVM Forum talk and code available > > > CAUTION: This email originated from outside of the organization. > > > On Friday, October 25, 2019, Taylor Simpson > <tsimp...@quicinc.com<mailto:tsimp...@quicinc.com>> wrote: > We would like inform the you that we will be doing a talk at the KVM Forum > next week on QEMU for Qualcomm Hexagon. Alessandro Di Federico, Niccolo > Izzo, and I have been working independently on implementations of the Hexagon > target. We plan to merge the implementations, have a community review, and > ultimately have Hexagon be an official target in QEMU. Our code is available > at the links below. > https://github.com/revng/qemu-hexagon > https://github.com/quic/qemu > If anyone has any feedback on the code as it stands today or guidance on how > best to prepare it for review, please let us know. > > > Hi, Taylor, Niccolo (and Alessandro too). > > I didn't have a chance to take a look at neither the code nor the docs, but I > did attend you presentation at KVM Forum, and I found it superb and > attractive, one of the best on the conference, if not the very best. > > I just have a couple of general questions: > > - Regarding the code you plan to upstream, are all SIMD instructions > implemented via tcg API, or perhaps some of them remain being implemented > using helpers? > > - Most of SIMD instructions can be viewed simply as several paralel > elementary operations. However, for a given SIMD instruction set, usually not > all of them fit into this pattern. For example, "horizontal add" (addind data > elements from the same SIMD register), various "pack/unpack/interleave/merge" > operations, and more general "shuffle/permute" operations as well (here I am > not sure which of these are included in Hexagon SIMD set, but there must be > some). How did you deal with them? > > - What were the most challenging Hexagon SIMD instructions you came accross > while developing your solution? > > Sincerely, > Aleksandar > > > > > Thanks, > Taylor -- Alex Bennée