Hi, On Mon, Aug 4, 2014 at 11:59 AM, Clément Bœsch <u...@pkh.me> wrote:
> On Mon, Aug 04, 2014 at 11:44:43AM -0400, Ronald S. Bultje wrote: > > Hi, > > > > On Mon, Aug 4, 2014 at 11:42 AM, Clément Bœsch <u...@pkh.me> wrote: > > > > > On Mon, Aug 04, 2014 at 11:17:29AM -0400, Ronald S. Bultje wrote: > > > > Hi, > > > > > > > > On Sun, Aug 3, 2014 at 4:27 PM, Clément Bœsch <u...@pkh.me> wrote: > > > > > > > > > This removes the avcodec dependency and make the code almost twice > as > > > > > fast. More to come. > > > > > > > > > > The DCT factorization is based on "Fast and numerically stable > > > > > algorithms for discrete cosine transforms" from Gerlind Plonkaa & > > > > > Manfred Tasche (DOI: 10.1016/j.laa.2004.07.015). > > > > > > > > > > > > I have no comments on the patch itself, but can you explain why we're > > > > re-implementing a custom f/idct rather than using the one provided in > > > > lavcodec? It seems to me that going from fixedpoint/simd'ed to > float/c > > > > would be slower, not faster, so there must be more to this patch than > > > what > > > > I'm getting from it... > > > > > > > > > > OK so as said in private, I didn't find an accurate (not wrongly "JPEG" > > > like I originally said) 16x16 DCT in libavcodec. > > > > > > You suggested to use the HEVC or VP9 DCT. That's indeed one solution, > but > > > we currently have only IDCT for those (AFAIK), and I needed a float > > > implementation. > > > > > > You mean forward. idct is inverse, fdct is forward, such that > > idct(fdct(data[][])) =~ data[][]. > > Yeah sure I meant we have only the IDCT and I also needed the FDCT. The > "float implementation" was another point. > > > You can use the forward transforms > > provided in libvpx (for vp9) or x265 (hevc), they're quite precise, and > > already optimized. > > Yeah so basically I would have to maintain that port instead of my > implementation, which doesn't look ideal either (the point of using an > existing code in FFmpeg is that its maintenance would have been shared). Right, but it means one half (idct) is fully shared, with optimizations etc. - and only one half is unshared-but-forked-with-optimizations. Whereas right now, it's fully unshared and unoptimized... I agree the 3x3 makes it a little more tricky, so do whatever you feel is right; we don't want to have to convert datatypes 4x just so we can fit an integer fdct/idct pair between an otherwise full float chain. If the overall ultimate goal is for everything to be int, it makes sense, but I don't know what the plan is. Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel