ComfyUI: Hunyuan Text-to-Video
Creating videos up to a length of 5 seconds from textual prompt.
Hunyuan Video is a cutting-edge framework for large-scale, high-fidelity video generation—and it’s now fully integrated into the Yanoya Playground (based on ComfyUI).
Want to create stunning AI videos with just a text prompt? Here’s how to get started.
Click here to see the full sample video: Example Video
Quick Start Guide
🛒 Get Your License
Grab a Yanoya Playground license and paste your key into the activation form.
Select "ComfyUI Hunyuan Video" as your playground type.
⏳ Wait for the Cloud Magic
Your AI-powered cloud instance will spin up in ~10 minutes.
🎬 Load the Workflow
Go to Workflow → Browse Templates → Yanoya → HunyuanVideo.
📥 Download the giant Models
Before your first generation, grab these two essential models using the Model Manager (for details how to use it, see my previous post):hunyuan_video_t2v_720p_bf16.safetensors (26GB)
llava_llama3_fp8_scaled.safetensors (9GB)
Wait for the "Stop" button to turn into "Refresh" before downloading the next model.
Enable the ComfyUI logs to track download progress.
After both models are downloaded, hit "Refresh". If ComfyUI still complains about a missing model (red border on a node), manually select the model from the dropdown within the respective node.
Ready to Go!
Now you are set to create high-quality AI videos!
The model recommends a GPU card with 80 GB RAM, so often the cloud instance is powered by a NVIDIA H100 GPU (price tag ~$30.000). Although technically possible, it is challenging to run this model on your local PC, unless you’ve got a data center in your basement.
Understanding the KSampler in Diffusion Models
In previous posts, I covered CLIP (text encoding) and VAE (latent space/image decoding). Today, let’s dive into another key component of the workflow: the KSampler.
What does the KSampler do?
A sampler is the algorithm that guides the step-by-step noise removal in diffusion models, transforming random noise into a coherent image. It operates in the latent space, the same compressed representation used by the VAE to generate the final output.
Since latent space has multiple dimensions, the sampler’s job is to synchronize noise removal across them. Think of it like a race: a dimension like face might sharpen early (noise removed fast), while the dimension background stays blurry until final steps (computation finished before noise has been removed). Trying to remove noise too aggressively could lead to undesired side-effects like distorting of details, artifacts or poor prompt adherence. If it’s too slow, generation becomes inefficient.
Choosing the Right Sampler
Selecting a sampler is similar to adjusting heat while cooking: selecting the correct heat for each ingredient/dimension, too high heat will burn everything, if the heat is too small it takes forever. Selecting a good sampler algorithm is a balance between speed and quality and can help to reduce the required number of iterations/steps.
As you can see in the selection list, a variety of strategies is available in the KSampler node. A simple and fast one is euler. A specific model often comes with a recommendation for a sampler algorithm. For Hunyuan Video, I recommend to experiment with the sampler dpmpp_2m, which provides a good balance between speed and quality.
Your Turn!
Now it's your turn! Fire up ComfyUI, experiment with different samplers in Hunyuan Video, or push A1111 to its limits using the Yanoya AI Playground.
Got questions about workflows? Noticed something interesting in your trials? Drop a comment in the chat. I'd love to hear about your experiments and help troubleshoot any challenges. Stay tuned for our next deep dive!