ControlNet does have the potential to produce cohesive videos, so many people including myself have tried to use it to transform videos, but the result is usually not satisfactory, due to the fact that even 2 input images are very similar, like only different in a handful of pixels, the output images can be very different and the difference is not just in those pixels that are different in the input. Anyway, it's a good effort to make an animation extension, and use optical flow estimation algorithm to try to keep the animation stable. @Laura, thanks for writing about SD-CN-Animation, I'll try it and maybe find ways to improve the algorithm.