
Running an Image Generation API for Unlimited Visuals (Beginner Guide)
Stand up one GPU VM, run an image model behind a tiny API, and batch thumbnails/diagrams on demand.
Running an Image Generation API for Unlimited Visuals (Beginner Guide)
Build your own “image factory” for thumbnails, title cards, diagrams, and filler visuals—fast, private, and consistent.
What you’ll build
- One GPU VM in a nearby region (e.g.,
asia-southeast1
) for low latency. - A tiny web API on that VM with two endpoints:
POST /generate
→ make a new image from a promptPOST /edit
→ (optional) modify an existing image (variations/inpaints)
- A model runner (e.g., Qwen-Image / SDXL / Flux) inside Docker, fronted by FastAPI.
- Cloud storage bucket for outputs (PNG/WEBP + a JSON sidecar with prompt/seed).
- Simple security: secret bearer token, service account for bucket, HTTPS only.
- Single “front door”: HTTPS load balancer on 443 only (no open VM ports).
- Logs & monitoring, plus a snapshot so you can clone new nodes quickly.
You’ll keep model files (checkpoint, LoRA, VAE) on the VM’s local NVMe for speed, and send finished images to cloud storage where your app (or editor) can fetch them.
Why run your own API?
- Consistency: reuse seeds & presets so thumbnails feel on brand.
- Speed: batch-create assets while you script or edit.
- Privacy: prompts & references stay on your infrastructure.
- Cost control: fixed hourly GPU can beat per-image pricing at volume.
When not to: if you only need a few images a week, a web tool is simpler. This guide is for regular publishing or teams who want a house style.

The big picture (plain-English)
- Your app sends a prompt to
https://api.yourdomain.com/generate
(e.g., “flat illustration of the 50/30/20 rule, three boxes, big labels, brand colors”) - The VM’s model turns that prompt into an image.
- The API saves the image to your cloud bucket, returns a signed link (temporary URL) so your app can download it.
- The API also saves a tiny metadata JSON (prompt, seed, steps, guidance, style) next to the image for reproducibility.
That’s the workflow. Everything else (Docker, drivers, tokens) is plumbing to make it safe and reliable.
Before you start
- Cloud account with billing enabled
- A project just for this (keeps costs & permissions tidy)
- A bucket for outputs (e.g.,
reactivid-images-prod
) - A domain for a friendly API URL (optional but recommended)
- A note of: project ID, bucket, VM name, region, and your secret token
Step 1: Create the GPU VM
Choose a nearby region/zone for lower latency. A balanced starter on many clouds is an L4-based instance (e.g., g2-standard-4
on GCP; on AWS, a recent g5 or g6 class).
- CPU/RAM: 4–8 vCPU / 16–32 GB RAM is fine to start.
- Disk: 100–200 GB boot + optional local NVMe (fast) for model files.
- Network: no public ports yet (we’ll add HTTPS via a load balancer).
- Name it something obvious like
img-api-01
.
Tip: If you need to keep costs low, stop the VM off-hours.
Step 2: Install the basics on the VM
SSH in and install GPU drivers, Docker, and Git. After install, verify:
nvidia-smi # should list your GPU
docker --version # should print a version
git --version
Optionally test Docker GPU access:
docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi
Follow your cloud’s official docs for driver + container runtime. The goal: GPU visible inside containers.
Step 3: Lay out folders & environment
On the VM:
sudo mkdir -p /opt/img-api/{models,outputs,server,keys,presets}
sudo chown -R $USER:$USER /opt/img-api
Create /opt/img-api/.env
:
MODEL_NAME=Qwen-Image
BUCKET_BACKEND=gcs # or s3
BUCKET_NAME=reactivid-images-prod
REGION=asia-southeast1
AUTH_BEARER=YOUR_SUPER_SECRET_TOKEN
SERVICE_ACCOUNT_JSON=/opt/img-api/keys/sa.json # (GCS) path on VM
S3_BUCKET_URL= # (S3) e.g., s3://reactivid-images-prod
S3_REGION= # (S3) e.g., ap-southeast-1
S3_PROFILE= # (S3) or explicit keys via env/instance role
BIND_ADDR=127.0.0.1
BIND_PORT=8000
Keep models on fast disk; outputs is scratch (finals go to bucket).
Step 4: The FastAPI app (minimal but real)
/opt/img-api/server/app.py
import os, io, json, time, hashlib, base64
from datetime import datetime, timedelta
from typing import Optional, Dict, Any
from fastapi import FastAPI, HTTPException, Header
from pydantic import BaseModel
from PIL import Image
# Storage backends (stubs w/ basic GCS & S3 examples)
class StorageClient:
def __init__(self):
self.backend = os.getenv("BUCKET_BACKEND", "gcs")
self.bucket = os.getenv("BUCKET_NAME")
self.region = os.getenv("REGION")
self.sa_json = os.getenv("SERVICE_ACCOUNT_JSON", "")
# Initialize real clients here (google-cloud-storage or boto3)
def _key(self, style, seed):
now = datetime.utcnow()
return f"{now:%Y/%m/%d}/{style}/img_{seed}_{int(time.time())}.png"
def upload_and_sign(self, img_bytes: bytes, meta: Dict[str, Any], style: str, seed: int) -> Dict[str, str]:
key = self._key(style, seed)
meta_key = key.replace(".png", ".json")
# TODO: upload bytes and meta to bucket
# return signed URLs; here we return placeholders for clarity
return {
"image_url": f"https://storage.example.com/{key}?signed=1",
"meta_url": f"https://storage.example.com/{meta_key}?signed=1",
}
class Request(BaseModel):
prompt: str
style: Optional[str] = "clean_infographic"
seed: Optional[int] = 42
size: Optional[str] = "1024x1024"
preset: Optional[str] = None
class EditRequest(BaseModel):
image_url: str
prompt: str
seed: Optional[int] = 42
AUTH_TOKEN = os.getenv("AUTH_BEARER", "")
app = FastAPI()
store = StorageClient()
def require_auth(authorization: Optional[str]):
if not AUTH_TOKEN:
return
if not authorization or not authorization.startswith("Bearer "):
raise HTTPException(status_code=401, detail="Missing bearer token")
token = authorization.split(" ", 1)[1].strip()
if token != AUTH_TOKEN:
raise HTTPException(status_code=403, detail="Invalid token")
def dummy_generate(prompt: str, size: str, seed: int) -> bytes:
# Placeholder: replace with your real model call (Diffusers, etc.)
# Here we just make a gray canvas with the seed stamped in pixels.
w, h = map(int, size.lower().split("x"))
img = Image.new("RGB", (w, h), (240, 240, 240))
return _png_bytes(img)
def _png_bytes(pil_img: Image.Image) -> bytes:
buf = io.BytesIO()
pil_img.save(buf, format="PNG")
return buf.getvalue()
@app.post("/generate")
def generate(req: Request, authorization: Optional[str] = Header(None)):
require_auth(authorization)
# If preset name provided, load preset JSON (style, cfg, steps, etc.)
preset = {}
if req.preset:
pth = f"/opt/img-api/presets/{req.preset}.json"
if os.path.exists(pth):
with open(pth, "r", encoding="utf-8") as f:
preset = json.load(f)
seed = int(req.seed or 42)
size = req.size or "1024x1024"
img_bytes = dummy_generate(req.prompt, size, seed) # replace with real model call
meta = {
"prompt": req.prompt,
"style": req.style,
"seed": seed,
"size": size,
"model": os.getenv("MODEL_NAME"),
"preset": req.preset,
"created_utc": datetime.utcnow().isoformat() + "Z"
}
urls = store.upload_and_sign(img_bytes, meta, req.style or "default", seed)
return {"status": "ok", "seed": seed, "style": req.style, **urls}
@app.post("/edit")
def edit(req: EditRequest, authorization: Optional[str] = Header(None)):
require_auth(authorization)
# TODO: fetch existing image, apply variation/inpaint via model call
seed = int(req.seed or 42)
img_bytes = dummy_generate(req.prompt + " (edit)", "1024x1024", seed)
meta = {
"source": req.image_url, "prompt": req.prompt, "seed": seed,
"model": os.getenv("MODEL_NAME"), "created_utc": datetime.utcnow().isoformat() + "Z"
}
urls = store.upload_and_sign(img_bytes, meta, "edit", seed)
return {"status": "ok", "seed": seed, **urls}
A tiny uvicorn
entrypoint (/opt/img-api/server/main.py
):
import os
import uvicorn
if __name__ == "__main__":
host = os.getenv("BIND_ADDR", "127.0.0.1")
port = int(os.getenv("BIND_PORT", "8000"))
uvicorn.run("app:app", host=host, port=port, reload=False)
Why localhost? We’ll put HTTPS in front and keep the app private.
Step 5: Dockerize the service
/opt/img-api/server/requirements.txt
fastapi==0.115.0
uvicorn==0.30.6
pydantic==2.9.0
pillow==10.4.0
# add google-cloud-storage or boto3 when wiring real storage
/opt/img-api/server/Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py main.py /app/
# For GPU access inside the container, run with --gpus all (host must have drivers)
ENV PYTHONUNBUFFERED=1
CMD ["python", "main.py"]
/opt/img-api/docker-compose.yml
services:
api:
build:
context: ./server
env_file:
- .env
volumes:
- ./models:/models
- ./outputs:/outputs
- ./keys:/opt/img-api/keys:ro
- ./presets:/opt/img-api/presets:ro
deploy:
resources:
reservations:
devices:
- capabilities: ["gpu"]
# Bind to localhost only; LB will front this
ports:
- "127.0.0.1:8000:8000"
command: ["python", "main.py"]
Start it:
cd /opt/img-api
docker compose up -d --build
Health check (from the VM):
curl -s http://127.0.0.1:8000/docs | head -n 5
Step 6: Connect storage
- Create your bucket (
reactivid-images-prod
). - Give the VM a service account with write access.
- Implement the real upload/sign logic in
StorageClient
using either:- GCS (
google-cloud-storage
) with V4 signed URLs - S3 (
boto3
) with presigned_url
- GCS (
Return two links per job:
image_url
→ the PNG/WEBPmeta_url
→ JSON sidecar (prompt
,seed
,steps
,guidance
,style
,model
,created_utc
)

Step 7: Add simple security
- Bearer token: require
Authorization: Bearer <token>
on every request. - Service account: VM identity or key to write to the bucket.
- Secrets manager: store your token/keys safely; the container reads env vars at start.
- Firewall: keep VM private; only the HTTPS LB is public.
This is enough for a small team. Later you can add per-user keys or OAuth if needed.
Step 8: Put HTTPS in front
Create a small HTTPS load balancer (managed certificate) for api.yourdomain.com
:
- Listener on 443
- Backend: your VM’s internal service at
127.0.0.1:8000
(via an internal proxy/NEG depending on cloud) - Close 80 (or redirect to 443)
- Keep VM ports closed to the world
Now your API lives at https://api.yourdomain.com
.
Step 9: Test the endpoints
Generate
POST https://api.yourdomain.com/generate
Headers:
Authorization: Bearer YOUR_SUPER_SECRET_TOKEN
Body (JSON):
{
"prompt": "Flat illustration of the 50/30/20 budgeting rule, three labeled boxes and arrows, brand colors, white background",
"style": "clean_infographic",
"seed": 42,
"size": "1024x1024"
}
Expected response:
{
"status": "ok",
"image_url": "https://signed.link/yourimage.png",
"meta_url": "https://signed.link/yourimage.json",
"seed": 42,
"style": "clean_infographic"
}
Edit (optional)
POST https://api.yourdomain.com/edit
Headers:
Authorization: Bearer YOUR_SUPER_SECRET_TOKEN
Body:
{
"image_url": "https://...existing.png",
"prompt": "same layout, switch palette to warm oranges, thicker labels",
"seed": 42
}
If you get a valid image back, congrats—you have a working factory.
Make it feel like a product
Presets
Create JSON presets under /opt/img-api/presets/
(the API can accept "preset": "clean_infographic"
).
/opt/img-api/presets/clean_infographic.json
{
"style": "clean_infographic",
"palette": ["#0A3", "#111", "#FFD94D"],
"guidance": 7.5,
"steps": 30,
"negative": "no watermark, no extra text, no logos"
}
Style library
Keep a small card per style (one-liner, colors, seed range). Use consistent seeds for series to keep a recognizable look across playlists.
House prompts (copy/paste)
- Thumbnail (bold & readable):
Close-up of [subject], centered, dramatic lighting, simple background, clean space for large title, high contrast, vivid complementary colors, editorial photo style, no clutter, 4k
- Title card / diagram (clear info):
Flat illustration of [concept] with 3 labeled boxes and arrows, minimal color palette (brand colors), large legible typography, white background, infographic style, no extra text
- Cutaway filler (scene-setter):
Abstract background with soft bokeh in [brand colors], gentle gradient, slight depth of field, loop-friendly, clean and non-distracting
Add a negative prompt: no watermark, no extra text, no logos
.

Logs, monitoring, and cost sanity
- Request logs: prompt, style, seed, size, request ID, render time.
- Metrics: jobs/min, avg render time, GPU memory usage (
nvidia-smi
), failures. - Budget: stop the VM when idle; for heavy days, start a second node.
If traffic grows, add a lightweight job queue (Redis / managed queue) and one worker per GPU.
Scaling up gently
- Image/template the VM once it works.
- Managed group / auto scale: add a second VM with one click when needed.
- Snapshots: of boot + model cache; if a node dies, you can recreate quickly.
Even if you never scale, these steps make disaster recovery painless.
Daily runbook (what your team actually does)
- Morning: power the VM on (or leave it running on busy weeks).
- Batch: send your prompt list (titles → thumbnail prompts → diagrams).
- Review: pick winners fast—don’t nitpick.
- Tag: favorite images get
_final.png
and a “used-in-vid” note in metadata. - Off-hours: shut the VM down to save cost (unless your team is global).
Prompts that work for faceless channels
Thumbnails
Close-up of [subject], centered, dramatic lighting, simple background, big empty space for title, high contrast, bold complementary colors, editorial photo style, no clutter, 4k
Title cards / diagrams
Flat illustration of [concept] with 3 labeled boxes and arrows, minimal palette (brand colors), large legible typography, white background, infographic style, no extra text
Cutaways
Abstract motion background, soft bokeh in [brand colors], gentle gradient, loop-friendly, clean and non-distracting
Negative prompt (when supported): no watermark, no extra text, no logos
Troubleshooting (quick fixes)
- “Model not using GPU” → driver/runtime mismatch. Check
nvidia-smi
inside the container; start with--gpus all
. - “OOM / out of memory” → reduce image size/steps; try a lighter model; restart worker.
- “Unauthorized” → missing/incorrect
Authorization: Bearer ...
header. Rotate token if leaked. - “Slow responses” → first request is a cold start; keep service warm with a small ping; avoid mid-day restarts.
- “Images vary too much” → reuse seeds and presets; keep prompts concise & consistent.
- “Storage links don’t work” → signed URLs expired; increase TTL or fetch immediately; ensure VM clock is correct.

Safety and rights
- Don’t create look‑alike people or logos you don’t own.
- Follow platform rules & local laws.
- Keep a lightweight reuse log (prompt, seed, where used).
- If you use third‑party references, make sure you have permission.
What you can do next
- Add a simple web form: paste title → get thumbnail options.
- Auto‑generate alt text or captions for accessibility.
- Add
/upscale
or/caption
endpoints if your workflow needs them. - Run a second VM in another region for redundancy.
Appendix A — Example client calls
cURL (generate)
curl -s -X POST "https://api.yourdomain.com/generate" -H "Authorization: Bearer YOUR_SUPER_SECRET_TOKEN" -H "Content-Type: application/json" -d '{
"prompt": "Flat illustration of the 50/30/20 budgeting rule, three labeled boxes and arrows, brand colors, white background",
"style": "clean_infographic",
"seed": 1234,
"size": "1024x1024",
"preset": "clean_infographic"
}'
Node snippet
const res = await fetch("https://api.yourdomain.com/generate", {
method: "POST",
headers: {
"Authorization": "Bearer " + process.env.IMG_API_TOKEN,
"Content-Type": "application/json"
},
body: JSON.stringify({
prompt: "Cinematic close-up of a vintage compass on dark map, dramatic rim light, room for title",
style: "cinematic_thumb",
seed: 777,
size: "1280x720",
preset: "cinematic_thumb"
})
});
const data = await res.json();
console.log(data.image_url);
Appendix B — .env template
MODEL_NAME=Qwen-Image
BUCKET_BACKEND=gcs
BUCKET_NAME=reactivid-images-prod
REGION=asia-southeast1
AUTH_BEARER=REPLACE_ME
SERVICE_ACCOUNT_JSON=/opt/img-api/keys/sa.json
BIND_ADDR=127.0.0.1
BIND_PORT=8000
.gitignore (keep secrets out of git):
/keys/*
.env
outputs/*
Summary
You stood up a single GPU VM, ran a tiny FastAPI service with an image model, saved results to a bucket, and locked it behind HTTPS with a secret token. From your app (or a small form), you can now ask for visuals on demand—thumbnails, diagrams, and cutaways—without paying per image or juggling web UIs. Keep presets tight, reuse seeds, and batch your work. That’s how you get unlimited visuals that look like your brand, on your schedule.