Skip to content Skip to footer

WAN 2.6 on ComfyUI – Installation Guide, New Nodes and Multi‑Modal Features

WAN 2.6 is Alibaba’s latest multimodal video model and, as of early 2026, one of the most advanced generative‑AI systems available through ComfyUI. It accepts text, images and reference videos, and produces high‑fidelity clips with native audio and smooth motion. The model can be run via API nodes in ComfyUI or through services such as Floyo, Artlist and Griptape. This article explains how to set up WAN 2.6 in ComfyUI, discusses recent node changes, and summarises the new video, image and audio features and performance improvements introduced in WAN 2.6.

1 What makes WAN 2.6 special?

WAN 2.6 represents a leap over earlier WAN releases. The model’s core advancements are designed to tackle problems users complained about in WAN 2.5 and earlier: short video length, inconsistent motion, and poor audio synchronisation. Key improvements include:

  • Multi‑modal input – the model takes text prompts, single images or multiple reference images and even short reference videos. This allows you to anchor the output to a specific character, environment or shot style.
  • Longer, coherent videos – WAN 2.6 generates clips up to 15 seconds long, maintaining consistent characters and scenes across multiple shots. This is a significant upgrade compared with the 5–10 second clips typical of earlier versions.
  • Native audio‑visual synchronisation – audio is generated alongside the video. Dialogue, sound effects and music are synchronised with motion during generation, greatly reducing manual lip‑sync or timing work. The model even supports multiple languages and multi‑voice tracks.
  • Multi‑shot narrative engine – WAN 2.6 can assemble several shots into a cohesive sequence with logical camera movements and transitions. The model keeps characters, lighting and style consistent from shot to shot, enabling cinematic storytelling.
  • High‑resolution output – according to Artlist’s review of the model, WAN 2.6 produces 1080p video at 24 fps, delivering sharp motion and professional‑quality visuals.
  • Flexible aspect ratios – you can output landscape (16:9), portrait (9:16) or square (1:1) videos to suit different platforms.

These innovations, along with improved motion logic and image anchoring, make WAN 2.6 far more production‑ready than earlier releases. It’s suited to cinematic storytelling, marketing videos, character animation and pre‑visualisation.

2 Installing WAN 2.6 in ComfyUI

Prerequisites – hardware and software: A modern GPU with at least 12 GB VRAM is recommended. On an RTX 4090, WAN 2.6 can render 15‑second 1080p videos; on an RTX 4070 with 12 GB VRAM, expect 576–720p output with fewer frames. You’ll also need Python 3.10+, CUDA 12.1 and a fresh virtual environment.

Step 1 – Set up ComfyUI

  1. Clone ComfyUI from the official GitHub repository.
  2. Create a virtual environment and install the requirements with pip install -r requirements.txt.
  3. (Optional but recommended) Install ComfyUI‑Manager – this plugin adds a UI for discovering and installing custom nodes.

Step 2 – Install required node packs

WAN 2.6 does not run with the base ComfyUI alone. You must install a few custom node packs:

Node packPurposeInstallation
ComfyUI‑VideoHelperSuiteProvides nodes for loading video/image sequences, combining frames into videos and managing audio. Supports audio integration and previewing with real‑time feedback.Install via ComfyUI‑Manager or clone the repository into custom_nodes.
KJNodesContains glue nodes, samplers and mathematical utilities; often required for stable diffusion pipelines.Install via ComfyUI‑Manager or clone into custom_nodes.
WAN‑specific nodesUsually named “ComfyUI‑Wan” or included in motion/video node packs. These nodes wrap API calls to the WAN 2.6 model and manage its parameters.Search the registry via the Manager or clone from the repository; follow the node README for exact model paths.
VideoAddAudio and audio toolsOptional but useful if you need to add background music or narration. The VideoAddAudio node merges MP3/WAV audio with video using ffmpeg. Additional audio nodes allow mixing and resampling.Install via the Manager (search for “ComfyUI_Lam” or “AudioTools”) and restart ComfyUI.

Step 3 – Download model weights

  1. Acquire WAN 2.6 checkpoints – the base weights plus any motion/motion_prior files. These may be provided by your API subscription (Artlist, Floyo, Griptape etc.).
  2. Place model files in the correct directories. Most custom nodes expect weights under ComfyUI/models/checkpoints, ComfyUI/models/clip, ComfyUI/models/vae and ComfyUI/models/transformers. Paths are case‑sensitive on Linux.

Step 4 – Build a workflow

A basic image‑to‑video pipeline might include:

  1. Load image and optionally resize it.
  2. Add prompt conditioning (if supported by your API) – e.g., “cinematic, gentle parallax”.
  3. Connect the Wan 2.6 image‑to‑video node. Adjust parameters like frames (16–24 for first pass), frame rate (12–24 fps) and motion strength (0.6–0.8).
  4. Assemble frames with VideoHelperSuite’s FramesToVideo node and set codec (H.264, HEVC, VP9) and fps.
  5. Save the video to disk.

For advanced use, add a Batch or Iterator node to process a folder of images and fix seeds for consistent style across outputs. Ensure your graph includes a VAE node if required.

3 New ComfyUI nodes and updates (late 2025 – early 2026)

ComfyUI’s rapid development around late 2025 and early 2026 introduced numerous new nodes and performance improvements. Some highlights relevant to WAN 2.6 workflows include:

  • API Node expansion – by May 2025 ComfyUI added 62 new API nodes, covering paid models such as Flux Ultra, MiniMax and Wan 2.6. Users can run these models by purchasing credits and selecting templates via the ComfyUI interface.
  • ManualSigmas node – introduced in v0.7.0 (Dec 31 2025) to give finer control over sampling schedules.
  • Kling Motion Control node – allows precise motion transfer from reference videos to generated scenes; useful when combining WAN with other models.
  • ResizeByLongerSide for video – simplifies matching aspect ratios across different video models.
  • WAN2.6 ReferenceToVideo API node – added with v0.8.0 (Jan 7 2026) to call the WAN 2.6 reference‑to‑video model directly. This node accepts a 5‑second reference video and a prompt, returning a new sequence with your subject placed into a new context.
  • LTX‑2 audio‑video model – ComfyUI now natively supports LTX‑2, an open‑source model that generates motion, dialogue, background noise and music together in a single pass. This provides an alternative to WAN for local users who need free, synchronous audio‑video generation.
  • VideoAddAudio and Audio Tools – new nodes such as VideoAddAudio simplify merging audio into videos and support MP3/WAV formats. Additional audio nodes (batch processing, mixing, resampling) help with complex sound design.
  • Performance optimisations – recent updates introduced asynchronous memory offloading for AMD GPUs, improved VRAM management and major VRAM reductions for video models. These changes make WAN workflows more stable on consumer‑grade GPUs.
  • Front‑end and UI enhancements – ComfyUI’s Nodes 2.0 interface, introduced in late 2025, improved workflow progress tracking, added a new selection toolbox and enabled subgraphs for re‑usable node groups. The desktop version also auto‑updates to incorporate the latest features.

4 WAN 2.6 features: video, image and audio

The following table summarises the most notable features of WAN 2.6 across different modalities, based on reviews and official guides:

ModalityKey capabilitiesEvidence
Video generationGenerates 15‑second scenes with multiple shots, intelligent shot scheduling and multi‑camera storytelling. Produces 1080p output at 24 fps with improved temporal consistency and simulated physics such as gravity and fluid dynamics. Supports landscape, portrait and square aspect ratios.Floyo overview emphasises the 15‑second scene length and multi‑shot scheduling. Artlist confirms 1080p at 24 fps and multiple aspect ratios. Higgsfield highlights simulated world dynamics and accurate fluid and gravity effects.
Image‑to‑videoUses a single image or multiple reference images as anchors, keeping characters and style consistent while adding motion. Supports multi‑image input and reference control for fusing different angles into a coherent subject.Floyo’s documentation notes that you can upload several reference images and fuse them into a consistent subject with lip‑synced dialogue and sound effects.
Text‑to‑videoAccepts descriptive prompts specifying camera movements, actions and emotions; improved interpretation of complex instructions. The multi‑modal engine translates nuanced language into coherent sequences.Eachlabs notes that WAN 2.6 better interprets nuanced prompts and supports combining text with images or reference videos.
Reference‑to‑video (R2V)Takes a 5‑second video of a subject and generates a new scene while preserving the subject’s motion and appearance. Supports multi‑shot narratives and 720p/1080p resolution via API nodes.Griptape’s announcement notes that Wan 2.6 R2V supports 720p output and durations of 5, 10 or 15 seconds with audio generation.
AudioGenerates dialogue, sound effects and music alongside the visuals; delivers accurate lip‑sync and voice reproduction. Supports multi‑language and multi‑voice tracks. Additional nodes allow merging external audio into output videos.Eachlabs explains that audio is woven into the generation process, leading to precise lip‑sync and reduced post‑production work. A2E notes that Wan 2.6 provides built‑in music tools for creating custom tracks with multiple voices and languages. RunComfy’s VideoAddAudio node description details the process of combining audio and video via ffmpeg.
Performance & realismImproved motion logic, continuity across shots and advanced camera control reduce jerky transitions. Simulated physics (gravity, fluid dynamics) yield realistic actions and environmental interactions. VRAM optimisations and asynchronous offloading in ComfyUI update v0.4.0 improve stability on consumer GPUs.Eachlabs highlights the multi‑shot narrative engine and consistency improvements. Higgsfield emphasises simulated world dynamics. The ComfyUI changelog explains VRAM reductions and asynchronous offloading.

5 Workflow tips and common pitfalls

Write specific prompts. Include camera angles, character actions, lighting and mood. The model responds better to detailed instructions.

Use audio intentionally. Upload narration or music to guide timing and emotional tone. When using the API node, ensure your audio matches the length of the requested video; if shorter, it may loop.

Match aspect ratios. Resize images so their aspect ratio matches your target video; mis‑matched ratios can produce distorted outputs. The ResizeByLongerSide node helps here.

Check model paths. Errors often arise because the node cannot locate the WAN model file. Verify the exact filename and folder specified in the node’s README.

Monitor VRAM usage. If you experience out‑of‑memory errors, reduce frame count, resolution or motion strength. Major VRAM optimisations in ComfyUI v0.4.0 reduce memory requirements for video models, but high resolutions and long durations still require significant GPU memory.

6 Conclusion

WAN 2.6 ushers in a new era of AI video generation by combining longer, multi‑shot videos with native audio and flexible inputs. When integrated into ComfyUI, the model can run locally (for open‑source enthusiasts with powerful GPUs) or via API nodes (for cloud‑based workflows). Setting up WAN 2.6 requires installing the ComfyUI‑VideoHelperSuite, KJNodes and dedicated Wan nodes, and placing checkpoint files in the correct model directories. Recent ComfyUI updates introduced new nodes such as ManualSigmas and VideoAddAudio, expanded API support for paid models and delivered performance optimisations that make WAN workflows more reliable. With proper installation and careful workflow design, creators can harness WAN 2.6 to produce cinematic, audio‑synchronised videos from text, images or reference footage, all within the flexible, visual environment of ComfyUI.

Leave a comment