How to Install and Use the Wan 2.1 Video Generation Model Locally – Step-by-Step Guide

Daniel Hughes 31st May 2025 in AI - The New Era, News Tagged AI models, AI video, AI Workflow, Alibaba Cloud, Hugging Face, image to video, Local AI, open-source AI, Swarm UI, text-to-video, Video Editing AI, video generation, Wan2_1 - 4 Minutes

If you’re looking to harness the power of AI to generate high-quality videos from text or images, then the Wan 2.1 model is an excellent solution. Developed by Alibaba Cloud, Wan 2.1 offers remarkable video generation capabilities with support for both text-to-video and image-to-video workflows. Best of all, you can install and use it locally for free.

In this guide, we’ll walk you through each step required to set up Wan 2.1 on your Windows machine using tools like Git, .NET 8 SDK, SwarmUI, and Hugging Face model files. Whether you’re a beginner or tech-savvy, follow this guide to generate AI-powered videos effortlessly.

What Is Wan 2.1?

Wan 2.1 is an open-source video generation model created by Alibaba Cloud. It supports two model sizes:

1.3B (requires only 8GB VRAM — ideal for most local setups)
14B (for more powerful machines)

Wan 2.1 excels in:

Semantic understanding
Physical realism
Complex motion representation

Its performance makes it one of the top open-source video generation models available today.

Step 1: Install Git and .NET 8 SDK

Before anything else, install the required developer tools:

Git: Download and install from the official site
🔗 https://git-scm.com/downloads/win
.NET 8 SDK: This is necessary to run SwarmUI
🔗 https://dotnet.microsoft.com/en-us/download/dotnet/8.0

Download the Windows versions and install them by following the on-screen instructions.

Step 2: Install SwarmUI

SwarmUI acts as a user interface to run and manage video generation workflows.

Go to the SwarmUI GitHub page:
🔗 https://github.com/mcmonkeyprojects/SwarmUI?tab=readme-ov-file
Download the Windows installer from the page (avoid installing it on the C: drive as it needs storage for model files).
Run the downloaded .exe file. It will open a command prompt and initiate installation.
Once installed, SwarmUI will start a local server and open a setup page in your browser. Follow all seven setup steps until you reach the SwarmUI interface.

Tip: You can reopen SwarmUI later by running launch_windows.bat from your installation folder. Creating a desktop shortcut is also recommended.

Step 3: Download Wan 2.1 Model Files

Visit Hugging Face to download all required model files:

Main Model File (choose 1.3B or 14B based on your VRAM)
🔗 https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models
Save this in: ...\SwarmUI\Models\diffusion_models

Additional required files:

Clip Vision File
🔗 https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision
Clip Text Encoder File
🔗 https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/text_encoders
VAE File
🔗 https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae

Download and place each file in the correct directory under ...\SwarmUI\Models.

Step 4: Download Wan 2.1 Workflow Files

To use the model, you need predefined workflow templates:

🔗 Wan 2.1 Workflows

Available workflows:

Text to Image
Image to Video 480p
Image to Video 720p

Drag and drop the workflow file(s) into your open SwarmUI browser window. They will automatically load into the interface.

How to Use Wan 2.1 to Generate Videos

Once everything is installed and loaded, follow these steps:

Open SwarmUI – The interface will launch in your browser.
Click on ‘Generate’ at the top.
Load Your Model – You should see the Wan 2.1 models you downloaded. If not, click the Refresh button or double-check file paths.
Select a Workflow – Choose from text-to-video or image-to-video.
Configure Your Settings:
- Adjust frame rate
- Choose output format (e.g., WebP, GIF, MP4)
- Set image/video resolution
Add Prompts:
- Use the green box for positive prompts
- Use the red box for negative prompts
Click ‘Q’ – This will start the video generation process. Time varies depending on system performance.
Save Your Video – Once complete, right-click and save the file locally.

Image to Video Tip: If using an image-to-video workflow, you’ll see an image upload node—just drag your image into it and let the model handle the rest.

Online Option for Wan 2.1

If you prefer not to install locally, Wan 2.1 is also available as an online tool:
🔗 https://wan.video/

Final Thoughts

Wan 2.1 proves to be a robust and versatile video generation model with powerful local deployment options. Its ability to generate visually impressive, high-resolution videos from text and images — even on mid-range systems — makes it an exciting tool for creators, researchers, and hobbyists alike.

With the help of SwarmUI and this guide, you’re now ready to start creating AI videos right on your computer without relying on cloud services or paid software.

Disclaimer:

AI-generated content is for creative and educational purposes only. When using models like Wan 2.1, please ensure that any content generated complies with copyright laws, and refrain from using it for misinformation, impersonation, or unethical use cases.

Tags:

AI video generation, Wan 2.1, SwarmUI, Hugging Face models, Alibaba Cloud AI, install AI locally, text to video, image to video, Git, DotNET SDK, AI model installation, local AI tools, video AI workflow, open source video AI

Hashtags:

#AIVideo #Wan2_1 #SwarmUI #TextToVideo #ImageToVideo #AIModels #OpenSourceAI #VideoGeneration #LocalAI #HuggingFace #AlibabaCloud #AIWorkflow #VideoEditingAI

Visited 896 times, 1 visit(s) today

Daniel Hughes

Daniel is a UK-based AI researcher and content creator. He has worked with startups focusing on machine learning applications, exploring areas like generative AI, voice synthesis, and automation. Daniel explains complex concepts like large language models and AI productivity tools in simple, practical terms.

Website · More from this author