Text, Image, Audio, and Video Inputs
A Gemini Omni video API workflow should treat each input as a useful signal: text for direction, images for identity, video for motion and pacing, and audio or voice references for performance timing where the active rollout supports it. The how-to workflow shows how to structure those signals before automation.
