On Thu, Jun 10, 2021 at 05:49:48PM +0200, Anton Khirnov wrote: > Quoting Michael Niedermayer (2021-06-01 15:02:27) > > On Mon, May 31, 2021 at 09:55:11AM +0200, Anton Khirnov wrote: > > > Currently existing sws_scale() accepts as input a user-determined slice > > > of input data and produces an indeterminate number of output lines. > > > > swscale() should return the number of lines output > > it does "return dstY - lastDstY;" > > But you do not know the number of lines beforehand. > I suppose one could assume that the line counts will always be the same > for any run with the same parameters (strictly speaking this is not > guaranteed) and store them after the first frame, but then the first > scale call is not parallel. And it would be quite ugly. >
> > > > > > > Since the calling code does not know the amount of output, it cannot > > > easily parallelize scaling by calling sws_scale() simultaneously on > > > different parts of the frame. > > > > > > Add a new function - sws_scale_dst_slice() - that accepts as input the > > > entire input frame and produces a specified slice of the output. This > > > function can be called simultaneously on different slices of the output > > > frame (using different sws contexts) to implement slice threading. > > > > an API that would allow starting before the whole frame is available > > would have reduced latency and better cache locality. Maybe that can > > be added later too but i wanted to mention it because the documentation > > exlicitly says "entire input" > > That would require some way of querying how much input is required for > each line. I dot not feel sufficiently familiar with sws architecture to > see an obvious way of implementing this. And then making use of this > information would require a significantly more sophisticated way of > dispatching work to threads. hmm, isnt the filter calculated by initFilter() (for the vertical stuff) basically listing the input/output relation ? (with some special cases like cascaded_context maybe) its a while since i worked on swscale so maybe iam forgetting something Maybe that can be (easily) used ? > > Or are you proposing some specific alternative way of implementing this? > > > > > Also there are a few tables between the multiple SwsContext which are > > identical, it would be ideal if they can be shared between threads > > I guess such sharing would need to be implemented before the API is > > stable otherwise adding it later would require application to be changed > > In my tests, the differences are rather small. E.g. scaling > 2500x3000->3000x3000 with 32 threads uses only ~15% more memory than > with 1 thread. > > And I do not see an obvious way to implement this that would be worth > the extra complexity. Do you? Well, dont we for every case of threading in the codebase cleanly split the context in one thread local and one shared? I certainly will not dispute that its work to do that. But we did it in every case because its the "right thing" to do for a clean implemtation. So i think we should aim toward that too here But maybe iam missing something ? Thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".