On 10/20/2017 06:46 AM, Bastian Koppelmann wrote: > The RISC-V data in YAML consists of: > > 1) prefixes > 2) instruction formats > 3) instruction specification using the data of 1) and 2) > > 1) prefixes have: > - name > - bitmask that identifies instructions belonging to it > - length that specifies the length in bits for all instructions with > this prefix > > Example prefix description for 16 bit instructions: > > 16Bit: { > bitmask: ~0b11, > len: 16 > } > > 2) instruction formats have: > - name > - list of immediate fields > - list of 'normal' fields (e.g. register identifier) > > The immediate fields are specified separately in the same file with: > - name > - data: in order list of the individual fields + possible sign i > > extension > - fill: how is the immediate filled from the LSB (for now only > '0' fills are allowed) > > Example for B-type immediate: > B: {data: [31s, 7, 30..25, 11..8], fill: '0'} > Note here that bit 31 is used for sign extension as indicated by > the 's'.
I think "append" might be a clearer name than "fill". Reading this for the first time, I wondered how you were specifying one bit of fill. That only became clear when I saw - U: {data: [31..20, 19..12], fill: '000000000000'} As for the generated GET_X_IMM, please use inline functions instead of macros. There doesn't appear to be any need for the preprocessor. > 3) instruction specification have: > - name > - instruction format from 2) > - prefix from 1) > - keymap of opcode fields and the value they take for this > instruction > - func for binary translation There's some redundant information here -- OP1_32 overlaps prefix. You should either drop the redundancy (infer prefix from opcode) or validate it (assert that opcode matches prefix). > The python script then converts these instructions into a tree of > switch-case statements depending on the specified opcodes and generates > at each case for one instruction two function calls: > > 1) decode_format_* depending on the instruction format used by this > instruction. > The decode_format_* functions are also generated by the script and > take of decoding the 'normal' fields and the immediate fields. This > data is stored in a data structure 'DisasFormats' which needs to be > passed to any translation function used by instructions. > > 2) A call to the translation function defined by 'func' in 3) > These need to be defined by the developer in translate.c. For example > for LUI this looks like: > > static void gen_lui(CPURISCVState *env, DisasContext *ctx) > { > if (ctx->fmt.field_rd == 0) { > return; /* NOP */ > } > tcg_gen_movi_tl(cpu_gpr[ctx->fmt.field_rd], ctx->fmt.immediate_U); > } (As an aside, it's bad form to pass around ENV unless the function is going to have to read more code from guest memory. Which isn't true in your case.) Unless you have a real use-case in mind, I'm not keen on all of the fields being accessible all of the time. You have all of the information in hand to do all of the proper checking with the compiler. My suggestion is to have not one DisasFormats struct with all possible fields included, but one DisasFormatX struct per format containing only the fields present. Then pass that separately to the function. E.g. static void gen_lui(DisasContext *ctx, DisasFormatU *fmt) { if (fmt.field_rd == 0) { return; /* NOP */ } tcg_gen_movi_tl(cpu_gpr[fmt.field_rd], fmt->immediate_U); } Within the decode_32bit function, you could have union { DisasFormatR R; DisasFormatI I; ... } fmts; ... case 0: /* ADDI */ decode_format_I(ctx, &fmts.I); gen_addi(ctx, &fmts.I); break; etc. You now have isolation on exactly the fields present in the format, and you have type checking on the function prototype that there is agreement between the decoder and the generation functions about the format. Not especially important, but how much effort would it be to determine that all valid insns below a certain point in the parse tree are the same format? I was just looking at e.g. case 35: /* Decode FUNC3_32 */ switch (extract32(ctx->opcode, 12, 3)) { case 0: /* SB */ decode_format_S(ctx); gen_sb(env, ctx); break; case 1: /* SH */ decode_format_S(ctx); gen_sh(env, ctx); break; case 2: /* SW */ decode_format_S(ctx); gen_sw(env, ctx); break; default: kill_unknown(ctx, RISCV_EXCP_ILLEGAL_INST); break; } break; and thinking that the decode_format_S could be hoisted above the switch. The compiler won't do it because it's not used on the kill_unknown path. Major OP1 == 115 is another example with multiple sub-switches, but all insns are format I. All that said, this looks very readable. Thank you. r~