53 curated media & design tools tools for Python developers, ranked by quality score.
Portable file server with accelerated resumable uploads, dedup, WebDAV, SFTP, FTP, TFTP, zeroconf, media indexer, thu...
A fast, feature-rich GPU-accelerated terminal emulator
SoTA open-source TTS
A modern selfhosted media management system for your media library
Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery
Swing Music is a beautiful, self-hosted music player for your local audio files. Like a cooler Spotify ... but bring ...
Conversational voice AI agents
LongLive: Real-time Interactive Long Video Generation
HeartMuLa Official Repo: The Most Powerful Open-Source Music Generation Model of 2026
On-device TTS model by Neuphonic
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expr...
Official repository for LTX-Video
rCM: SOTA Diffusion Distillation & Few-Step Video Generation based on sCM/MeanFlow
Fast and Universal 3D reconstruction model for versatile tasks
Code for "FlashWorld: High-quality 3D Scene Generation within Seconds"
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
High-Quality Text-to-Video Generation with Alpha Channel
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation
Official repo for paper "Video-As-Prompt: Unified Semantic Control for Video Generation"
We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThi...
PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image genera...
A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.
Official implementation of "VideoMaMa: Mask-Guided Video Matting via Generative Prior", CVPR 2026
Voyager is an interactive RGBD video generation model conditioned on camera input, and supports real-time 3D reconstr...
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
Offical code for "FastGS: Training 3D Gaussian Splatting in 100 Seconds"
ViPE: Video Pose Engine for Geometric 3D Perception
[ArXiv 25] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style,...
a free local self hosted video compressor webui designed for performance and ease of use. inspired by 8mb.video
Bridge the gap between photo and video color grading. Accurately apply any creative LUT to your RAW files with this t...
FIBO is a SOTA, first open-source, JSON-native text-to-image model built for controllable, predictable, and legally s...
Lets make video diffusion practical!
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multi...
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Official code for StoryMem: Multi-shot Long Video Storytelling with Memory
SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation
MoCha: End-to-End Video Character Replacement without Structural Guidance
A highly compressive and high-quality neural audio codec for speech models.
Video generation via code
Automatic Video Generation from Scientific Papers
[Preprint 2025] Ditto: Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset
Trace Anything: Representing Any Video in 4D via Trajectory Fields
Quick illustration of how one can easily read books together with LLMs. It's great and I highly recommend it.
Spark-TTS Inference Code
Official Implementations for Paper - HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives
TTS model capable of streaming conversational audio in realtime.
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
[arXiv 2025] VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation