Running an Image Generation API for Unlimited Visuals (Beginner Guide)

By Craig September 14, 2025 Tutorials

Stand up one GPU VM, run an image model behind a tiny API, and batch thumbnails/diagrams on demand.

Running an Image Generation API for Unlimited Visuals (Beginner Guide)

Build your own “image factory” for thumbnails, title cards, diagrams, and filler visuals—fast, private, and consistent.

What you’ll build

One GPU VM in a nearby region (e.g., asia-southeast1) for low latency.
A tiny web API on that VM with two endpoints:
- POST /generate → make a new image from a prompt
- POST /edit → (optional) modify an existing image (variations/inpaints)
A model runner (e.g., Qwen-Image / SDXL / Flux) inside Docker, fronted by FastAPI.
Cloud storage bucket for outputs (PNG/WEBP + a JSON sidecar with prompt/seed).
Simple security: secret bearer token, service account for bucket, HTTPS only.
Single “front door”: HTTPS load balancer on 443 only (no open VM ports).
Logs & monitoring, plus a snapshot so you can clone new nodes quickly.

You’ll keep model files (checkpoint, LoRA, VAE) on the VM’s local NVMe for speed, and send finished images to cloud storage where your app (or editor) can fetch them.

Why run your own API?

Consistency: reuse seeds & presets so thumbnails feel on brand.
Speed: batch-create assets while you script or edit.
Privacy: prompts & references stay on your infrastructure.
Cost control: fixed hourly GPU can beat per-image pricing at volume.

When not to: if you only need a few images a week, a web tool is simpler. This guide is for regular publishing or teams who want a house style.

The big picture (plain-English)

Your app sends a prompt to https://api.yourdomain.com/generate
(e.g., “flat illustration of the 50/30/20 rule, three boxes, big labels, brand colors”)
The VM’s model turns that prompt into an image.
The API saves the image to your cloud bucket, returns a signed link (temporary URL) so your app can download it.
The API also saves a tiny metadata JSON (prompt, seed, steps, guidance, style) next to the image for reproducibility.

That’s the workflow. Everything else (Docker, drivers, tokens) is plumbing to make it safe and reliable.

Before you start

Cloud account with billing enabled
A project just for this (keeps costs & permissions tidy)
A bucket for outputs (e.g., reactivid-images-prod)
A domain for a friendly API URL (optional but recommended)
A note of: project ID, bucket, VM name, region, and your secret token

Step 1: Create the GPU VM

Choose a nearby region/zone for lower latency. A balanced starter on many clouds is an L4-based instance (e.g., g2-standard-4 on GCP; on AWS, a recent g5 or g6 class).

CPU/RAM: 4–8 vCPU / 16–32 GB RAM is fine to start.
Disk: 100–200 GB boot + optional local NVMe (fast) for model files.
Network: no public ports yet (we’ll add HTTPS via a load balancer).
Name it something obvious like img-api-01.

Tip: If you need to keep costs low, stop the VM off-hours.

Step 2: Install the basics on the VM

SSH in and install GPU drivers, Docker, and Git. After install, verify:

nvidia-smi          # should list your GPU
docker --version    # should print a version
git --version

Optionally test Docker GPU access:

docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

Follow your cloud’s official docs for driver + container runtime. The goal: GPU visible inside containers.

Step 3: Lay out folders & environment

On the VM:

sudo mkdir -p /opt/img-api/{models,outputs,server,keys,presets}
sudo chown -R $USER:$USER /opt/img-api

Create /opt/img-api/.env:

MODEL_NAME=Qwen-Image
BUCKET_BACKEND=gcs            # or s3
BUCKET_NAME=reactivid-images-prod
REGION=asia-southeast1
AUTH_BEARER=YOUR_SUPER_SECRET_TOKEN
SERVICE_ACCOUNT_JSON=/opt/img-api/keys/sa.json   # (GCS) path on VM
S3_BUCKET_URL=                 # (S3) e.g., s3://reactivid-images-prod
S3_REGION=                     # (S3) e.g., ap-southeast-1
S3_PROFILE=                    # (S3) or explicit keys via env/instance role
BIND_ADDR=127.0.0.1
BIND_PORT=8000

Keep models on fast disk; outputs is scratch (finals go to bucket).

Step 4: The FastAPI app (minimal but real)

/opt/img-api/server/app.py

import os, io, json, time, hashlib, base64
from datetime import datetime, timedelta
from typing import Optional, Dict, Any
from fastapi import FastAPI, HTTPException, Header
from pydantic import BaseModel
from PIL import Image

# Storage backends (stubs w/ basic GCS & S3 examples)
class StorageClient:
    def __init__(self):
        self.backend = os.getenv("BUCKET_BACKEND", "gcs")
        self.bucket = os.getenv("BUCKET_NAME")
        self.region = os.getenv("REGION")
        self.sa_json = os.getenv("SERVICE_ACCOUNT_JSON", "")
        # Initialize real clients here (google-cloud-storage or boto3)

    def _key(self, style, seed):
        now = datetime.utcnow()
        return f"{now:%Y/%m/%d}/{style}/img_{seed}_{int(time.time())}.png"

    def upload_and_sign(self, img_bytes: bytes, meta: Dict[str, Any], style: str, seed: int) -> Dict[str, str]:
        key = self._key(style, seed)
        meta_key = key.replace(".png", ".json")
        # TODO: upload bytes and meta to bucket
        # return signed URLs; here we return placeholders for clarity
        return {
            "image_url": f"https://storage.example.com/{key}?signed=1",
            "meta_url": f"https://storage.example.com/{meta_key}?signed=1",
        }

class Request(BaseModel):
    prompt: str
    style: Optional[str] = "clean_infographic"
    seed: Optional[int] = 42
    size: Optional[str] = "1024x1024"
    preset: Optional[str] = None

class EditRequest(BaseModel):
    image_url: str
    prompt: str
    seed: Optional[int] = 42

AUTH_TOKEN = os.getenv("AUTH_BEARER", "")
app = FastAPI()
store = StorageClient()

def require_auth(authorization: Optional[str]):
    if not AUTH_TOKEN:
        return
    if not authorization or not authorization.startswith("Bearer "):
        raise HTTPException(status_code=401, detail="Missing bearer token")
    token = authorization.split(" ", 1)[1].strip()
    if token != AUTH_TOKEN:
        raise HTTPException(status_code=403, detail="Invalid token")

def dummy_generate(prompt: str, size: str, seed: int) -> bytes:
    # Placeholder: replace with your real model call (Diffusers, etc.)
    # Here we just make a gray canvas with the seed stamped in pixels.
    w, h = map(int, size.lower().split("x"))
    img = Image.new("RGB", (w, h), (240, 240, 240))
    return _png_bytes(img)

def _png_bytes(pil_img: Image.Image) -> bytes:
    buf = io.BytesIO()
    pil_img.save(buf, format="PNG")
    return buf.getvalue()

@app.post("/generate")
def generate(req: Request, authorization: Optional[str] = Header(None)):
    require_auth(authorization)
    # If preset name provided, load preset JSON (style, cfg, steps, etc.)
    preset = {}
    if req.preset:
        pth = f"/opt/img-api/presets/{req.preset}.json"
        if os.path.exists(pth):
            with open(pth, "r", encoding="utf-8") as f:
                preset = json.load(f)

    seed = int(req.seed or 42)
    size = req.size or "1024x1024"
    img_bytes = dummy_generate(req.prompt, size, seed)  # replace with real model call

    meta = {
        "prompt": req.prompt,
        "style": req.style,
        "seed": seed,
        "size": size,
        "model": os.getenv("MODEL_NAME"),
        "preset": req.preset,
        "created_utc": datetime.utcnow().isoformat() + "Z"
    }
    urls = store.upload_and_sign(img_bytes, meta, req.style or "default", seed)
    return {"status": "ok", "seed": seed, "style": req.style, **urls}

@app.post("/edit")
def edit(req: EditRequest, authorization: Optional[str] = Header(None)):
    require_auth(authorization)
    # TODO: fetch existing image, apply variation/inpaint via model call
    seed = int(req.seed or 42)
    img_bytes = dummy_generate(req.prompt + " (edit)", "1024x1024", seed)
    meta = {
        "source": req.image_url, "prompt": req.prompt, "seed": seed,
        "model": os.getenv("MODEL_NAME"), "created_utc": datetime.utcnow().isoformat() + "Z"
    }
    urls = store.upload_and_sign(img_bytes, meta, "edit", seed)
    return {"status": "ok", "seed": seed, **urls}

A tiny uvicorn entrypoint (/opt/img-api/server/main.py):

import os
import uvicorn

if __name__ == "__main__":
    host = os.getenv("BIND_ADDR", "127.0.0.1")
    port = int(os.getenv("BIND_PORT", "8000"))
    uvicorn.run("app:app", host=host, port=port, reload=False)

Why localhost? We’ll put HTTPS in front and keep the app private.

Step 5: Dockerize the service

/opt/img-api/server/requirements.txt

fastapi==0.115.0
uvicorn==0.30.6
pydantic==2.9.0
pillow==10.4.0
# add google-cloud-storage or boto3 when wiring real storage

/opt/img-api/server/Dockerfile

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py main.py /app/

# For GPU access inside the container, run with --gpus all (host must have drivers)
ENV PYTHONUNBUFFERED=1
CMD ["python", "main.py"]

/opt/img-api/docker-compose.yml

services:
  api:
    build:
      context: ./server
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./outputs:/outputs
      - ./keys:/opt/img-api/keys:ro
      - ./presets:/opt/img-api/presets:ro
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]
    # Bind to localhost only; LB will front this
    ports:
      - "127.0.0.1:8000:8000"
    command: ["python", "main.py"]

Start it:

cd /opt/img-api
docker compose up -d --build

Health check (from the VM):

curl -s http://127.0.0.1:8000/docs | head -n 5

Step 6: Connect storage

Create your bucket (reactivid-images-prod).
Give the VM a service account with write access.
Implement the real upload/sign logic in StorageClient using either:
- GCS (google-cloud-storage) with V4 signed URLs
- S3 (boto3) with presigned_url

Return two links per job:

image_url → the PNG/WEBP
meta_url → JSON sidecar (prompt, seed, steps, guidance, style, model, created_utc)

Step 7: Add simple security

Bearer token: require Authorization: Bearer <token> on every request.
Service account: VM identity or key to write to the bucket.
Secrets manager: store your token/keys safely; the container reads env vars at start.
Firewall: keep VM private; only the HTTPS LB is public.

This is enough for a small team. Later you can add per-user keys or OAuth if needed.

Step 8: Put HTTPS in front

Create a small HTTPS load balancer (managed certificate) for api.yourdomain.com:

Listener on 443
Backend: your VM’s internal service at 127.0.0.1:8000 (via an internal proxy/NEG depending on cloud)
Close 80 (or redirect to 443)
Keep VM ports closed to the world

Now your API lives at https://api.yourdomain.com.

Step 9: Test the endpoints

Generate

POST https://api.yourdomain.com/generate
Headers:
  Authorization: Bearer YOUR_SUPER_SECRET_TOKEN
Body (JSON):
{
  "prompt": "Flat illustration of the 50/30/20 budgeting rule, three labeled boxes and arrows, brand colors, white background",
  "style": "clean_infographic",
  "seed": 42,
  "size": "1024x1024"
}

Expected response:

{
  "status": "ok",
  "image_url": "https://signed.link/yourimage.png",
  "meta_url": "https://signed.link/yourimage.json",
  "seed": 42,
  "style": "clean_infographic"
}

Edit (optional)

POST https://api.yourdomain.com/edit
Headers:
  Authorization: Bearer YOUR_SUPER_SECRET_TOKEN
Body:
{
  "image_url": "https://...existing.png",
  "prompt": "same layout, switch palette to warm oranges, thicker labels",
  "seed": 42
}

If you get a valid image back, congrats—you have a working factory.

Make it feel like a product

Presets

Create JSON presets under /opt/img-api/presets/ (the API can accept "preset": "clean_infographic").

/opt/img-api/presets/clean_infographic.json

{
  "style": "clean_infographic",
  "palette": ["#0A3", "#111", "#FFD94D"],
  "guidance": 7.5,
  "steps": 30,
  "negative": "no watermark, no extra text, no logos"
}

Style library

Keep a small card per style (one-liner, colors, seed range). Use consistent seeds for series to keep a recognizable look across playlists.

House prompts (copy/paste)

Thumbnail (bold & readable):
Close-up of [subject], centered, dramatic lighting, simple background, clean space for large title, high contrast, vivid complementary colors, editorial photo style, no clutter, 4k
Title card / diagram (clear info):
Flat illustration of [concept] with 3 labeled boxes and arrows, minimal color palette (brand colors), large legible typography, white background, infographic style, no extra text
Cutaway filler (scene-setter):
Abstract background with soft bokeh in [brand colors], gentle gradient, slight depth of field, loop-friendly, clean and non-distracting

Add a negative prompt: no watermark, no extra text, no logos.

Logs, monitoring, and cost sanity

Request logs: prompt, style, seed, size, request ID, render time.
Metrics: jobs/min, avg render time, GPU memory usage (nvidia-smi), failures.
Budget: stop the VM when idle; for heavy days, start a second node.

If traffic grows, add a lightweight job queue (Redis / managed queue) and one worker per GPU.

Scaling up gently

Image/template the VM once it works.
Managed group / auto scale: add a second VM with one click when needed.
Snapshots: of boot + model cache; if a node dies, you can recreate quickly.

Even if you never scale, these steps make disaster recovery painless.

Daily runbook (what your team actually does)

Morning: power the VM on (or leave it running on busy weeks).
Batch: send your prompt list (titles → thumbnail prompts → diagrams).
Review: pick winners fast—don’t nitpick.
Tag: favorite images get _final.png and a “used-in-vid” note in metadata.
Off-hours: shut the VM down to save cost (unless your team is global).

Prompts that work for faceless channels

Thumbnails
Close-up of [subject], centered, dramatic lighting, simple background, big empty space for title, high contrast, bold complementary colors, editorial photo style, no clutter, 4k

Title cards / diagrams
Flat illustration of [concept] with 3 labeled boxes and arrows, minimal palette (brand colors), large legible typography, white background, infographic style, no extra text

Cutaways
Abstract motion background, soft bokeh in [brand colors], gentle gradient, loop-friendly, clean and non-distracting

Negative prompt (when supported): no watermark, no extra text, no logos

Troubleshooting (quick fixes)

“Model not using GPU” → driver/runtime mismatch. Check nvidia-smi inside the container; start with --gpus all.
“OOM / out of memory” → reduce image size/steps; try a lighter model; restart worker.
“Unauthorized” → missing/incorrect Authorization: Bearer ... header. Rotate token if leaked.
“Slow responses” → first request is a cold start; keep service warm with a small ping; avoid mid-day restarts.
“Images vary too much” → reuse seeds and presets; keep prompts concise & consistent.
“Storage links don’t work” → signed URLs expired; increase TTL or fetch immediately; ensure VM clock is correct.

Safety and rights

Don’t create look‑alike people or logos you don’t own.
Follow platform rules & local laws.
Keep a lightweight reuse log (prompt, seed, where used).
If you use third‑party references, make sure you have permission.

What you can do next

Add a simple web form: paste title → get thumbnail options.
Auto‑generate alt text or captions for accessibility.
Add /upscale or /caption endpoints if your workflow needs them.
Run a second VM in another region for redundancy.

Appendix A — Example client calls

cURL (generate)

curl -s -X POST "https://api.yourdomain.com/generate"   -H "Authorization: Bearer YOUR_SUPER_SECRET_TOKEN"   -H "Content-Type: application/json"   -d '{
    "prompt": "Flat illustration of the 50/30/20 budgeting rule, three labeled boxes and arrows, brand colors, white background",
    "style": "clean_infographic",
    "seed": 1234,
    "size": "1024x1024",
    "preset": "clean_infographic"
  }'

Node snippet

const res = await fetch("https://api.yourdomain.com/generate", {
  method: "POST",
  headers: {
    "Authorization": "Bearer " + process.env.IMG_API_TOKEN,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    prompt: "Cinematic close-up of a vintage compass on dark map, dramatic rim light, room for title",
    style: "cinematic_thumb",
    seed: 777,
    size: "1280x720",
    preset: "cinematic_thumb"
  })
});
const data = await res.json();
console.log(data.image_url);

Appendix B — .env template

MODEL_NAME=Qwen-Image
BUCKET_BACKEND=gcs
BUCKET_NAME=reactivid-images-prod
REGION=asia-southeast1
AUTH_BEARER=REPLACE_ME
SERVICE_ACCOUNT_JSON=/opt/img-api/keys/sa.json
BIND_ADDR=127.0.0.1
BIND_PORT=8000

.gitignore (keep secrets out of git):

/keys/*
.env
outputs/*

Summary

You stood up a single GPU VM, ran a tiny FastAPI service with an image model, saved results to a bucket, and locked it behind HTTPS with a secret token. From your app (or a small form), you can now ask for visuals on demand—thumbnails, diagrams, and cutaways—without paying per image or juggling web UIs. Keep presets tight, reuse seeds, and batch your work. That’s how you get unlimited visuals that look like your brand, on your schedule.