Unlocking the full potential of Sora 2
Creating cinematic videos with Sora 2 isn’t just about typing a few words—it’s about designing prompts that guide the AI with precision and creativity. This guide explores reusable templates, explains why their structure works, and shows real-world examples that deliver consistent results. Whether you’re a content creator, marketer, or storyteller, mastering prompt engineering saves time, reduces frustration, and elevates your projects.
Why Prompt Engineering Matters
- Prompts are instructions: They tell the AI exactly what to generate.
- Well-structured prompts = reliable outputs: Avoid wasted credits and inconsistent results.
- Professional advantage: Strong prompt engineering sets your work apart and opens creative opportunities.
A Reusable Prompt Template
Instead of relying on prose, use a YAML or JSON-style template to keep details organized and persistent across shots.
How to use this template:
– Pick your cult classic or genre homage
– Fill in each section with iconic cues (costume, props, lighting, audio)
– Add your anti-climatic punchlines in the dialog section
– Keep persistence notes so the AI doesn’t drift details between shots.
– Help with understanding the purpose for these are linked and can be read further below in the Legend.
1. Subject / Scene Settings
• Audience: {locale="XX"}
• Narrative tone: [genre + mood, e.g. “horror–surreal, intimate”]
• Subject type: person / object / creature
• Key features: [main visual anchors, e.g. outfit, props, surreal effects]
• Motion: [how things move, e.g. float, orbit, stumble, glitch]
• Age / Vibe: [character identity + emotional tone]
• Outfit / Props: [costume details, iconic items]
• Location / Time / Weather: [setting + atmosphere]
• Foreground / Midground / Background: [layered elements for depth]
2. Lighting & Grade
• Lighting plan: [key, rim, fill, practicals, haze, gobo]
• Color palette: [specific hues, e.g. “sickly green, tungsten amber”]
• Post effects: [grain, vignette, halation, chromatic aberration, flares]
3. Camera & Lens
• Shot types: [ES/WS/MS/CU/WIDE-AERIAL, etc.]
• Composition: [center, thirds, occluders, parallax]
• Lens/Focus: [mm feel, DOF, rack focus]
• Coverage: [master + inserts, match-on-action, eyeline]
• Persistence: [elements that must remain consistent across shots]
4. Dialogue / Punchline
• Timing: [0–2s, 2–4s, etc.]
• Line(s): [short, anticlimactic gag or dramatic beat]
5. Audio
• BGM: [style, tempo, instruments]
• SFX: [ambient textures, props, glitches]
Cues: [timed stingers, fades, drops]
Template Legend
How to use this Legend:
– Reference each property when building your own prompt
– Adjust values to fit your scene, character, or desired output
– Combine properties for richer, more controlled results
– Experiment with tone, lighting, and camera to discover new creative possibilities.
1. Audience
Purpose: Specifies the intended viewer or language locale.
Why Important: Ensures dialogue, cultural references, and tone match the target audience.
2. Narrative Tone
Purpose: Sets the emotional and stylistic mood of the scene.
Why Important: Guides Sora’s output toward the desired atmosphere, affecting visuals, dialogue, and pacing.
3. Subject Type / Alias / Continuity
Purpose: Defines the main character(s), their role, and ensures consistent references throughout the scene.
Why Important: Prevents confusion and maintains character identity, especially in multi-scene projects. This may include a named subject, or a Cameo.
Additional Note about aliases and cameos: I've not seen this work 100% of the time, so I often resort to not using nouns for my cameos, but explicitly call out my cameo username @pixelsyndcate instead of the character name. Another gotcha might be using possessive apostrophes. For example, "@pixelsyndicate's hand" may not be interpreted as a cameo, so I change it to "the hand of @pixelsyndicate" to avoid this hassle of finding out later.
4. Key Features
Purpose: Describes the character’s appearance, costume, and distinguishing traits.
Why Important: Directly influences visual rendering and character recognition.
5. Scale
Purpose: Sets the physical scope of the scene (human-scale, wide landscape, etc.).
Why Important: Determines camera framing and environmental detail.
6. Motion
Purpose: Describes character and object movement within the scene.
Why Important: Adds realism and dynamism, guiding animation and timing.
7. Vibe
Purpose: Summarizes the emotional arc or feeling of the scene.
Why Important: Helps Sora set the right mood and pacing.
Additional Note: When you see an arrow pointing from one value to another such as "Close-up → Wide-Shot", this indicates to the generative model that there's a transition rather than a direct CUT to another thing.
8. Location / Time / Weather/Light
Purpose: Specifies setting, time of day, and lighting conditions.
Why Important: Crucial for visual accuracy, atmosphere, and continuity.
9. Composition & Environment
Purpose: Details foreground (FG), midground (MG), and background (BG) elements.
Why Important: Guides scene layout, depth, and storytelling focus.
10. Lighting / Grade / Texture
Purpose: Specifies lighting sources, color grading, and texture effects.
Why Important: Sets visual style, mood, and realism.
11. Camera / Lens / Focus / Coverage
Purpose: Directs camera angles, lens choices, focus techniques, and shot coverage.
Why Important: Controls storytelling perspective, visual interest, and continuity.
Additional Note: seeing a progression like "thirds → center progression" indicates we are using a way for framing a shot that starts by placing the main subject off to one side - usually along one of the 'thirds' lines (rule-of-thirds_ and as the scene unfolds, the camera or the subject moves so that the focus shifts towards the center of the frame.
12. Dialogue
Purpose: Provides character lines, tone, and timing.
Why Important: Drives narrative, character development, and comedic or dramatic beats.
13. Audio (BGM & SFX) / Cues
Purpose: Specifies background music, sound effects, and their timing.
Why Important: Enhances immersion, emotional impact, and comedic timing.
Real world examples
Breaking skits into modular scenes ensures clarity. For instance, in Dispatch Devoured, the goat gag works because:
- Cameo username repetition prevents character confusion.
- Persistence notes keep props and gestures consistent.
- Action timing blocks fix lip sync and camera drift
Scene 1 – Dispatch Devoured

# Shufflebottom (@pixelsyndicate) - Dispatch Devoured
## Subject / Scene Settings
Audience: {locale="EN"}
Narrative tone: historical–absurd, dry comedy
Subject type: person
Key features: @pixelsyndicate appears as a British redcoat officer from 1775, dressed in full regalia of the 4th Dragoons. Crimson frock coat with Colonel’s epaulettes, brass gorget on green ribbon, crimson silk sash with fringe, powdered wig (no hat), buff (white) baldric supporting a cavalry sword with gold ribbon knot.
Scale: human-scale camp clearing
Motion:
- slow walk to dispatch table
- goat enters frame
- parchment snatch timed after heirloom line
- goat chewing continues through end of scene
- @pixelsyndicate reacts with escalating horror
Vibe: prideful → flustered
Location: military camp clearing near dusk
Time: late afternoon
Weather/Light: golden hour sun; soft haze
## Composition & Environment
Key elements (FG/MG/BG):
- FG: dispatch table with parchment, ink well, quill; goat head entering frame
- MG: @pixelsyndicate standing, gesturing toward dispatch; goat chewing
- BG: canvas tents, stacked crates, idle soldiers watching
Lighting:
- Key: warm soft key 3/4 L from low sun
- Rim: subtle back rim from sun edge
- Fill: ambient bounce from canvas tents
- Atmosphere: haze light; dust motes; goat shadow cast across table
Grade:
- Palette: parchment beige, crimson, brass, buff-white, goat brown, canvas tan
- Curve: soft contrast; vignette mild; grain medium; halation warm
- Texture: analog softness; candlelight bloom; slight CA
Visual taste: historical realism × dry absurdity
Background/Location: camp clearing with dispatch table, tents, and goat-accessible chaos
## Camera
Camera: MS/CU; thirds→center progression; occluders/parallax via goat, dispatch table
Lens/Focus: 40mm; shallow DOF; rack from parchment to goat’s mouth
Coverage: master + inserts; match-on-action (goat snatch, flinch by @pixelsyndicate); screen direction steady.
Subject continuity: all references to “officer” refers to @pixelsyndicate as defined in Key Features
Persist: goat chewing loop; officer gesture consistent
## Dialogue / Delivery / Style (note to self - measured, aristocratic tone)
- [0–2s] @pixelsyndicate (proudly, measured aristocratic tone): "I have a personal commendation from His Majesty, King George."
- [2–4s] @pixelsyndicate (turning, measured aristocratic tone): "This dispatch will be a family heirloom—"
- [4–5s] Goat snatches parchment from table (timed to interrupt heirloom line)
- [5–6s] @pixelsyndicate (flustered, panic tone): "No—no! That’s not for you, you ruminant menace!"
- [6–7s] Goat chewing continues; soldiers chuckle
- [7–8s] @pixelsyndicate (under breath, panic tone escalating): "By the crown, I’ll have that goat rug in my quarters!"
- Goat chewing persists through [9s]; @pixelsyndicate remains horrified
## Audio (BGM & SFX)
BGM: light chamber strings; comedic oboe flutter; ambient camp sounds
SFX: parchment rustle; goat bleat; chewing; officer gasp; distant soldier chuckle
Cues:
- [0.0s]: parchment tap
- [4.5s]: goat snatch
- [5.0s]: chewing loop
- [7.5s]: officer mutter
## IF YOU FIND LIP SYNC AND CAMERA MOVES ARE STILL OFF, TRY THIS ADDED DETAIL
action_timing:
- time: 0.0–2.0s
description: Officer introduces commendation
camera: MS on @pixelsyndicate approaching dispatch table
action: proud gesture toward parchment
audio: parchment tap; ambient hush
- time: 2.0–4.0s
description: Heirloom line
camera: CU on @pixelsyndicate face, gesturing toward parchment
action: officer speaks line; goat head begins to enter frame
audio: ambient camp sounds; faint rope jingle
- time: 4.0–5.0s
description: Goat snatch interrupts
camera: insert CU of goat head grabbing parchment from table
action: parchment pulled away mid-line
audio: parchment rustle; goat bleat
note: parchment is not in goat’s mouth before this moment
- time: 5.0–7.0s
description: Panic reaction
camera: CU on flinching @pixelsyndicate, lip sync with panic tone
action: officer protests; goat chewing loop begins
audio: chewing loop; officer gasp; soldier chuckle
- time: 7.0–9.0s
description: Final beat
camera: MS showing goat chewing continuously; soldiers in BG chuckling
action: officer mutters "By the crown…" in panic tone
audio: goat chewing persists; distant cough; ambient camp sounds
Scene 4 – Rejected by the Crown
Another example, Rejected by the Crown, uses:
- Progressive shot framing (WS → CU → POV → WS) for emotional impact.
- Atmospheric cues (dust poofs, golden hour glow) for realism.
- Dialogue pacing to balance comedy with melancholy.

#Shufflebottom - Rejected by the Crown
##Subject / Scene Settings
Audience: {locale="EN"}
Narrative tone: historical–melancholic, dry comedy
Subject type: person ( @pixelsyndicate )
Key features: @pixelsyndicate appears as a British redcoat officer from 1775, dressed in full regalia of the 4th Dragoons. Crimson frock coat with Colonel’s epaulettes, brass gorget on green ribbon, crimson silk sash with fringe, powdered wig (no hat), buff-white baldric supporting cavalry sword with gold ribbon knot. Holding a braided leather rope tethering a camp goat.
Scale: human-scale road descending toward coastal town
Motion: slow walk; slumped shoulders; dragging steps; goat trotting beside; dust poofs; boom rise
Vibe: dejected, theatrical
Location: hard-pack dirt road curving toward a 1700s British oceanside town
Time: golden hour
Weather/Light: warm low sun; glowing ocean reflection
##Composition & Environment
Key elements (FG/MG/BG):
- FG: officer’s dragging boots; goat’s hooves; dust poofs
- MG: @pixelsyndicate walking with slumped posture; braided rope taut
- BG: oceanside town with port; large naval and trading ships; glowing ocean horizon Lighting:
- Key: warm golden hour key from rear 3/4 R
- Rim: soft back rim from low sun
- Fill: ambient bounce from dirt road and ocean glow
- Atmosphere: light haze; dust motes in poofs
Grade:
- Palette: crimson, brass, buff-white, dirt brown, ocean gold, goat brown
- Curve: soft contrast; vignette mild; grain medium; halation warm
- Texture: analog softness; candlelight bloom; slight CA
Visual taste: historical realism × dry pathos
Background/Location: descending dirt road to port town with ships and glowing ocean
##Camera
Camera: WS → CU → POV → WS rising boom
Lens/Focus: 40mm; shallow DOF; rack from boots to goat hooves; face to goat reaction
Coverage: master + inserts; match-on-action (step, dust poof); screen direction steady
Persist: dust poof loop; goat reaction timed
Action & Timing (9s Total)
[0.0–2.5s] — Establishing Walk of Shame
- Camera: WS following behind officer and goat
- Action: slow walk; slumped posture; dragging steps
- Atmosphere: dust poofs with each footstep
- Lighting: golden hour glow; ocean shimmer visible ahead
[2.5–4.0s] — Footstep Detail
- Camera: CU tracking boots and goat hooves
- Action: dust poofs timed with steps
- Audio: soft dirt crunch; rope jingle
[4.0–6.0s] — Mumbling & Goat Reaction
- Camera: CU POV on officer’s pouting face
- Dialogue (mumbled, aristocratic tone):
- “Roasted… braised… stewed… beaten with a stick…”
- Camera: POV to goat’s head — ears perk, eyes widen
- Audio: goat snort; rope tension
[6.0–9.0s] — Reveal of Destination
- Camera: WS rising boom from behind officer and goat
- Motion: slow rise to reveal oceanside town, port, and ships
- Atmosphere: warm ocean glow; silhouettes of masts
- Audio: distant gulls; harbor bells faint
##Audio (BGM & SFX)
BGM: slow chamber strings; ambient wind; distant harbor sounds
SFX: dirt crunch; rope jingle; goat snort; gulls; harbor bell
Cues:
- [0.0s]: ambient wind + dirt crunch
- [2.5s]: rope jingle
- [4.0s]: mumble begins
- [5.5s]: goat snort
- [6.5s]: gulls faint
- [8.5s]: harbor bell chime
All 4 scenes combined into a Final Video
Learning From Other Creators

Often learning how to work with a new tool can be accomplish by seeing what others have been doing, and looking at the prompts they used to create the pieces. In the Sora app by OpenAI (sora.chatgpt.com) some creators leave their prompt source available if it’s not too long when they try to publish their videos.
One particular creator I’ve learn much from is keigo_matsumaru who creates some stunning abstract music video clips with amazing camera angles, glitch effects and elements that keep continuity. Some examples I’ve enjoyed are : Invisible Chorus and Glitch//Slap – both are in my opinion extraordinary examples of prompts using a mixture of film direction, post effects and audio engineering.
Special Elements Used By Keigo_Matsumara
- Lighting and Atmosphere which, while sticking to the classic three-point pattern, but skewed towards a coldness and Grading which is a post-pipeline baked in prompt that tells Sora to emulate advanced grading/VFX.
- An Inference layer that informs the AI model to treat the whole sequence as a stylized audiovisual performance, not a documentary realism.
- A Locale “EN” but flagged for JP (Japanese) viewers, with no VO (voice-over) – suggesting text‑driven or purely visual communication, leaning into Japanese cyber‑noir aesthetics.
- A Narrative Tone which sets an emotional register: kinetic, sensual, surreal in a “cyber-noir, ecstatic, body-driven’ way.
Keep Exploring, Keep Creating
Prompt engineering isn’t a destination — it’s a journey. Every experiment, every tweak, every unexpected output teaches us something new about how stories can be shaped with Sora 2.
If this guide gave you a few tools or ideas, don’t stop here. Keep refining your own templates, remixing genres, and studying how others push the boundaries. The more you practice, the more fluent you’ll become in this new cinematic language.
Stay curious. Share your experiments. Learn from the community.
That’s how we all grow — one prompt at a time.
You can see the progress of my own learning experience @ https://sora.chatgpt.com/profile/pixelsyndicate