On Wed, Feb 8, 2012 at 12:47 AM, Simon A. Eugster <simon.eu at gmail.com> wrote: > On 02/08/2012 09:08 AM, Alexandre Prokoudine wrote: >> On Wed, Feb 8, 2012 at 11:32 AM, Simon A. Eugster wrote: >> >>>> http://jeff.ecchi.ca/blog/2011/07/25/automated-multicamera-clip-syncing/ >>> >>> I have no idea how you manage to have a link to a solution for nearly >>> every problem. Thanks for the link! >> >> YW :) >> >>> How accurate can we position audio streams? Just by full frames, or is >>> it possible to have a finer granularity? When I synced audio/video I >>> often had the problem that the audio was too early and after moving it >>> by one frame it was too late.
then your sense is more acute than most humans' >> >> Admittedly, I haven't had a chance to test it myself yet. However >> http://bemasc.net/wordpress/2011/07/26/an-auto-aligner-for-pitivi/ >> states: > > Already read! :) > I rather meant kdenlive/MLT here. Can we move an audio clip by just a > few samples or only by full frames? at the framework level only by frame, but something more precise can be achieved with an audio filter, e.g. sox.delay > > Simon > >> "The algorithm I settled on resembles the method a human uses when >> looking at the waveform view. First, it breaks each input audio stream >> into 40 ms blocks and computes the mean absolute value of each block. 40ms aka duration of 25 fps frame, this value can be simply computed per mlt_frame >> The resulting 25 Hz signal is the ?volume envelope?. The code >> subtracts the mean volume from each track?s envelope, then performs a >> cross-correlation between tracks and looks for the peak, which >> identifies the relative shift. can be implemented as a passive transition that computes the shift and reports it through a property. Then, an application can make a frame-level adjustment at the edit level and apply sox.delay or frei0r.delay0r filters for sub-frame accuracy. Or, the transition can be dual pass, and perform the sub-frame adjustments itself on the second pass. Alternatively, kdenlive is already getting all of the audio in a consumer-frame-show event, so it could just do all of this analysis in its own code and use existing filters for sub-frame accuracy. >> To avoid performing N^2 >> cross-correlations, one clip is selected as the fixed reference, and >> all others are compared to it. The peak position is quantized to the >> block duration (creating an error of +/- 20ms), so to improve accuracy >> a parabolic fit is used to interpolate the true maximum. I don?t know >> the exact residual error, but I expect it?s typically less than 5 ms, >> which should be plenty good enough, seeing as sound travels about 1 >> foot per ms." >> >> Alexandre Prokoudine >> http://libregraphicsworld.org >>