If you’re looking to harness the power of AI to generate high-quality videos from text or images, then the Wan 2.1 model is an excellent solution. Developed by Alibaba Cloud, Wan 2.1 offers remarkable video generation capabilities with support for both text-to-video and image-to-video workflows. Best of all, you can install and use it locally for free.

In this guide, we’ll walk you through each step required to set up Wan 2.1 on your Windows machine using tools like Git, .NET 8 SDK, SwarmUI, and Hugging Face model files. Whether you’re a beginner or tech-savvy, follow this guide to generate AI-powered videos effortlessly.
What Is Wan 2.1?
Wan 2.1 is an open-source video generation model created by Alibaba Cloud. It supports two model sizes:
- 1.3B (requires only 8GB VRAM — ideal for most local setups)
- 14B (for more powerful machines)
Wan 2.1 excels in:
- Semantic understanding
- Physical realism
- Complex motion representation
Its performance makes it one of the top open-source video generation models available today.
Step 1: Install Git and .NET 8 SDK
Before anything else, install the required developer tools:
- Git: Download and install from the official site
🔗 https://git-scm.com/downloads/win - .NET 8 SDK: This is necessary to run SwarmUI
🔗 https://dotnet.microsoft.com/en-us/download/dotnet/8.0
Download the Windows versions and install them by following the on-screen instructions.
Step 2: Install SwarmUI
SwarmUI acts as a user interface to run and manage video generation workflows.
- Go to the SwarmUI GitHub page:
🔗 https://github.com/mcmonkeyprojects/SwarmUI?tab=readme-ov-file - Download the Windows installer from the page (avoid installing it on the C: drive as it needs storage for model files).
- Run the downloaded
.exefile. It will open a command prompt and initiate installation. - Once installed, SwarmUI will start a local server and open a setup page in your browser. Follow all seven setup steps until you reach the SwarmUI interface.
Tip: You can reopen SwarmUI later by running launch_windows.bat from your installation folder. Creating a desktop shortcut is also recommended.
Step 3: Download Wan 2.1 Model Files
Visit Hugging Face to download all required model files:
- Main Model File (choose 1.3B or 14B based on your VRAM)
🔗 https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/diffusion_models - Save this in:
...\SwarmUI\Models\diffusion_models
Additional required files:
- Clip Vision File
🔗 https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/clip_vision - Clip Text Encoder File
🔗 https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/text_encoders - VAE File
🔗 https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/tree/main/split_files/vae
Download and place each file in the correct directory under ...\SwarmUI\Models.
Step 4: Download Wan 2.1 Workflow Files
To use the model, you need predefined workflow templates:
Available workflows:
- Text to Image
- Image to Video 480p
- Image to Video 720p
Drag and drop the workflow file(s) into your open SwarmUI browser window. They will automatically load into the interface.
How to Use Wan 2.1 to Generate Videos
Once everything is installed and loaded, follow these steps:
- Open SwarmUI – The interface will launch in your browser.
- Click on ‘Generate’ at the top.
- Load Your Model – You should see the Wan 2.1 models you downloaded. If not, click the Refresh button or double-check file paths.
- Select a Workflow – Choose from text-to-video or image-to-video.
- Configure Your Settings:
- Adjust frame rate
- Choose output format (e.g., WebP, GIF, MP4)
- Set image/video resolution
- Add Prompts:
- Use the green box for positive prompts
- Use the red box for negative prompts
- Click ‘Q’ – This will start the video generation process. Time varies depending on system performance.
- Save Your Video – Once complete, right-click and save the file locally.
Image to Video Tip: If using an image-to-video workflow, you’ll see an image upload node—just drag your image into it and let the model handle the rest.
Online Option for Wan 2.1
If you prefer not to install locally, Wan 2.1 is also available as an online tool:
🔗 https://wan.video/
Final Thoughts
Wan 2.1 proves to be a robust and versatile video generation model with powerful local deployment options. Its ability to generate visually impressive, high-resolution videos from text and images — even on mid-range systems — makes it an exciting tool for creators, researchers, and hobbyists alike.
With the help of SwarmUI and this guide, you’re now ready to start creating AI videos right on your computer without relying on cloud services or paid software.
Disclaimer:
AI-generated content is for creative and educational purposes only. When using models like Wan 2.1, please ensure that any content generated complies with copyright laws, and refrain from using it for misinformation, impersonation, or unethical use cases.
Tags:
AI video generation, Wan 2.1, SwarmUI, Hugging Face models, Alibaba Cloud AI, install AI locally, text to video, image to video, Git, DotNET SDK, AI model installation, local AI tools, video AI workflow, open source video AI
Hashtags:
#AIVideo #Wan2_1 #SwarmUI #TextToVideo #ImageToVideo #AIModels #OpenSourceAI #VideoGeneration #LocalAI #HuggingFace #AlibabaCloud #AIWorkflow #VideoEditingAI