On Fri, Dec 6, 2013 at 7:36 PM, Benjamin Morris <[email protected]> wrote: > I've gathered a few hints regarding H264 video decoding on our hardware. > Hopefully some of them will be useful.
Very useful! > > First off, regarding naming in general. Our internal names for our video > engines differ from the names you've been using. Below is a translation map > between the names on http://nouveau.freedesktop.org/wiki/VideoAcceleration/ > and our internal names. This is more of an FYI than anything else, to help > translation; I don't expect it to help with this particular H264 hang. > > VP2 (same) > VP3 -> MSDEC > VP4.0 -> MSDEC2 > VP4.2 -> MSDEC3 > VP5 -> MSDEC4 > > Looking at your code, it seems that you're instantiating all 3 engines (VLD, > PDEC, PPP) on the same channel. This probably isn't causing the hang, but > it's bad practice in general, as it prevents the engines from running in > parallel. It's also impossible to use multiple engines on the same channel > like this on MSDEC4 (VP5) GPUs, so the same separate channel usage that you > need to have for MSDEC4 should also be used for everything down to G84. First off, thank you so much for diving into our code! Hope it wasn't _too_ dirty :) Yeah, for Kepler, we put things on separate channels. When I did the VP2 implementation, I also put them on separate channels since it seemed like that was what was being done from the traces (and I knew much less about all these things back then). But when I was doing the initial pass to get MSDEC[12] working based on existing MSDEC[34] code, I just left it alone, and it had them on the same channel for pre-kepler. There's some limitation in nouveau that prevents multiple channels from being active anyways (or something like that, it was explained to me, but I don't quite remember right now), so it won't matter either way for now. But in the future the plan is definitely to move to separate channels. > > Regarding "non-obvious" values for H264 decoding, looking at > nouveau_vp3_video_vp.c, it looks like there are several unknown values in the > H264 picture parameter structure, especially for the DPB reference table. > This seems like a potential cause for your MSDEC[12]-specific hangs; > incorrect DPB state can be difficult to figure out from picture parameter > dumps, and the PDEC response to incorrect DPB state is generally to simply > hang. There are no significant differences between MSDEC[12] (your > VP3/VP4.0) and MSDEC3 (your VP4.2) regarding DPB state, but improvements in > error resilience/concealment may simply be masking the problem on MSDEC3. > Below I've filled in our names for unnamed fields in that structure. > Hopefully this allows you to make some quick progress; you can apply the same > logic you already have for G84 to your G98 code path. Great. There are few acronyms in there that I'm not familiar with, but I suspect sufficient documentation-reading will fix the problem. One additional question is whether you have any comments on our inter-engine buffer sizing/usage (inter_data). Specifically the 0x720 method for VP (PDEC?) and 0x414 method on BSP (VLD?). Right now we set that to the same address/size, but I noticed that you offset them by 0x2100 (iirc). However when I did that, it just caused the engine to hang faster. (But then I noticed that on 331.20, which is what I used for my latest traces, the "kernel" fuc code had been updated from the one that we're extracting with my fw-cutter script, so perhaps the ABI has changed. Or perhaps the fw-cutter has an insufficiently-precise signature... I'll check it out later.) I will look at your comments on our picparm data structure and will adjust our code. Hopefully that's all that's needed. Thanks again! -ilia > > Thanks, > Ben > > struct h264_picparm_vp { // 700..a00 > uint16_t width, height; > uint32_t stride1, stride2; // 04 08 > uint32_t ofs[6]; // 0c..24 in-image offset > > uint32_t u24; // nfi ac8 ? -> ColocBufferSize > uint32_t bucket_size; // 28 bucket size > uint32_t inter_ring_data_size; // 2c > > unsigned f0 : 1; // 0 0x01: into 640 shifted by 3, 540 shifted by 5, > half size something? -> MbaffFrameFlag > unsigned f1 : 1; // 1 0x02: into vuc ofs 56 -> > direct_8x8_inference_flag > unsigned weighted_pred_flag : 1; // 2 0x04 > unsigned f3 : 1; // 3 0x08: into vuc ofs 68 -> > constrained_intra_pred_flag > unsigned is_reference : 1; // 4 > unsigned interlace : 1; // 5 field_pic_flag > unsigned bottom_field_flag : 1; // 6 > unsigned f7 : 1; // 7 0x80: nfi yet -> second_field (second field of > complementary reference field) > > signed log2_max_frame_num_minus4 : 4; // 31 0..3 > unsigned u31_45 : 2; // 31 4..5 -> chroma_format_idc > unsigned pic_order_cnt_type : 2; // 31 6..7 > signed pic_init_qp_minus26 : 6; // 32 0..5 > signed chroma_qp_index_offset : 5; // 32 6..10 > signed second_chroma_qp_index_offset : 5; // 32 11..15 > > unsigned weighted_bipred_idc : 2; // 34 0..1 > unsigned fifo_dec_index : 7; // 34 2..8 > unsigned tmp_idx : 5; // 34 9..13 -> CurrColIdx (index of associated > co-located motion data buffer) > unsigned frame_number : 16; // 34 14..29 > unsigned u34_3030 : 1; // 34 30..30 pp.u34[30:30] > unsigned u34_3131 : 1; // 34 31..31 pad? > > uint32_t field_order_cnt[2]; // 38, 3c > > struct { // 40 > // 0x00223102 > // nfi (needs: top_is_reference, bottom_is_reference, > is_long_term, maybe some other state that was saved.. > unsigned fifo_idx : 7; // 00 0..6 -> buffer id > // tmp_idx is the index of the associated co-located motion > data buffer > // for simplest management, ensure that this is always equal > to the buffer id > unsigned tmp_idx : 5; // 00 7..11 > unsigned unk12 : 1; // 00 12 not seen yet, but set, maybe > top_is_reference -> top_is_reference > unsigned unk13 : 1; // 00 13 not seen yet, but set, maybe > bottom_is_reference? -> bottom_is_reference > unsigned unk14 : 1; // 00 14 skipped? -> is_long_term > unsigned notseenyet : 1; // 00 15 pad? -> not_existing > unsigned unk16 : 1; // 00 16 -> is_field_pair > unsigned unk17 : 4; // 00 17..20 -> top_field_marking > (top_is_reference ? 1+is_long_term : 0) > unsigned unk21 : 4; // 00 21..24 -> bottom_field_marking > unsigned pad : 7; // 00 d25..31 > > uint32_t field_order_cnt[2]; // 04,08 > uint32_t frame_idx; // 0c > } refs[0x10]; > > uint8_t m4x4[6][16]; // 140 > uint8_t m8x8[2][64]; // 1a0 > // most of the remaining is MVC or SVC setup info, filled zero if not > MVC or SVC > uint32_t u220; // 220 number of extra reorder_list to append? > uint8_t u224[0x20]; // 224..244 reorder_list append ? > uint8_t nfi244[0xb0]; // add some pad to make sure nulls are read > }; _______________________________________________ Nouveau mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/nouveau
