Various ways to make videos with Stable diffusion

When you are playing with Stable diffusion, you may want to play around with the video.
Here is a summary of the environment for outputting videos, including the Stable diffusion web ui extension.

mov2mov

https://github.com/Scholar01/sd-webui-mov2mov

It’s a famous conversion of an image of a woman dancing on tiktok.
It seems that the video is divided into still images frame by frame, processed by img2img, and then merged as a video.

I also took up the challenge about mov2mov.
I think you will get better results if you use more than 2 control nets. Also, I think the model should be ANIME style.

  • Install as an extension to web ui (AUTOMATIC1111)

TemporalKit

https://github.com/CiaraStrawberry/TemporalKit

Compared to mov2mov, it seems to have suppressed flickering, etc. from frame to frame.
You may find it difficult to set up the UI for settings, etc.

  • Install as an extension to web ui (AUTOMATIC1111)
  • ffmpeg is used to composite the video; ffmpeg must be installed beforehand.
  • Can work with Ebsynth

Ebsynth

A tool that converts live-action videos to painting style. Currently in beta version.

https://ebsynth.com

sd-webui-text2video

https://github.com/deforum-art/sd-webui-text2video

Auto1111 extension implementing various text2video models, such as ModelScope and VideoCrafter, using only Auto1111 webui dependencies and downloadable models (so no logins required anywhere)

  • Install as an extension to web ui (AUTOMATIC1111)
  • Requires large video memory

Not Stable diffusion from here, but a revolutionary AI-based video create tool.

Runway

https://runwayml.com

Gen-2: The Next Step Generative AI.
A multi-modal AI system that can generate novel videos with text, images, or video clips.
My opinion, it’s like AfterEffects online.

Wonder Studio

wonderdynamics.com

AI tool that automatically animates, lights and composites CG characters into live-action scenes

NVIDIA vid2vid

https://github.com/NVIDIA/vid2vid

Video-to-Video Translation

  • Label-to-Streetview Results
  • Edge-to-Face Results
  • Pose-to-Body Results
  • Frame Prediction Results