Running an Image Generation API for Unlimited Visuals (Beginner Guide)

Running an Image Generation API for Unlimited Visuals (Beginner Guide)

Stand up one GPU VM, run an image model behind a tiny API, and batch thumbnails/diagrams on demand.

Running an Image Generation API for Unlimited Visuals (Beginner Guide)

Build your own “image factory” for thumbnails, title cards, diagrams, and filler visuals—fast, private, and consistent.

What you’ll build

You’ll keep model files (checkpoint, LoRA, VAE) on the VM’s local NVMe for speed, and send finished images to cloud storage where your app (or editor) can fetch them.


Why run your own API?

When not to: if you only need a few images a week, a web tool is simpler. This guide is for regular publishing or teams who want a house style.

Hook & format breakdown

The big picture (plain-English)

  1. Your app sends a prompt to https://api.yourdomain.com/generate
    (e.g., “flat illustration of the 50/30/20 rule, three boxes, big labels, brand colors”)
  2. The VM’s model turns that prompt into an image.
  3. The API saves the image to your cloud bucket, returns a signed link (temporary URL) so your app can download it.
  4. The API also saves a tiny metadata JSON (prompt, seed, steps, guidance, style) next to the image for reproducibility.

That’s the workflow. Everything else (Docker, drivers, tokens) is plumbing to make it safe and reliable.


Before you start


Step 1: Create the GPU VM

Choose a nearby region/zone for lower latency. A balanced starter on many clouds is an L4-based instance (e.g., g2-standard-4 on GCP; on AWS, a recent g5 or g6 class).

Tip: If you need to keep costs low, stop the VM off-hours.


Step 2: Install the basics on the VM

SSH in and install GPU drivers, Docker, and Git. After install, verify:

nvidia-smi          # should list your GPU
docker --version    # should print a version
git --version

Optionally test Docker GPU access:

docker run --rm --gpus all nvidia/cuda:12.3.2-base-ubuntu22.04 nvidia-smi

Follow your cloud’s official docs for driver + container runtime. The goal: GPU visible inside containers.


Step 3: Lay out folders & environment

On the VM:

sudo mkdir -p /opt/img-api/{models,outputs,server,keys,presets}
sudo chown -R $USER:$USER /opt/img-api

Create /opt/img-api/.env:

MODEL_NAME=Qwen-Image
BUCKET_BACKEND=gcs            # or s3
BUCKET_NAME=reactivid-images-prod
REGION=asia-southeast1
AUTH_BEARER=YOUR_SUPER_SECRET_TOKEN
SERVICE_ACCOUNT_JSON=/opt/img-api/keys/sa.json   # (GCS) path on VM
S3_BUCKET_URL=                 # (S3) e.g., s3://reactivid-images-prod
S3_REGION=                     # (S3) e.g., ap-southeast-1
S3_PROFILE=                    # (S3) or explicit keys via env/instance role
BIND_ADDR=127.0.0.1
BIND_PORT=8000

Keep models on fast disk; outputs is scratch (finals go to bucket).


Step 4: The FastAPI app (minimal but real)

/opt/img-api/server/app.py

import os, io, json, time, hashlib, base64
from datetime import datetime, timedelta
from typing import Optional, Dict, Any
from fastapi import FastAPI, HTTPException, Header
from pydantic import BaseModel
from PIL import Image

# Storage backends (stubs w/ basic GCS & S3 examples)
class StorageClient:
    def __init__(self):
        self.backend = os.getenv("BUCKET_BACKEND", "gcs")
        self.bucket = os.getenv("BUCKET_NAME")
        self.region = os.getenv("REGION")
        self.sa_json = os.getenv("SERVICE_ACCOUNT_JSON", "")
        # Initialize real clients here (google-cloud-storage or boto3)

    def _key(self, style, seed):
        now = datetime.utcnow()
        return f"{now:%Y/%m/%d}/{style}/img_{seed}_{int(time.time())}.png"

    def upload_and_sign(self, img_bytes: bytes, meta: Dict[str, Any], style: str, seed: int) -> Dict[str, str]:
        key = self._key(style, seed)
        meta_key = key.replace(".png", ".json")
        # TODO: upload bytes and meta to bucket
        # return signed URLs; here we return placeholders for clarity
        return {
            "image_url": f"https://storage.example.com/{key}?signed=1",
            "meta_url": f"https://storage.example.com/{meta_key}?signed=1",
        }

class Request(BaseModel):
    prompt: str
    style: Optional[str] = "clean_infographic"
    seed: Optional[int] = 42
    size: Optional[str] = "1024x1024"
    preset: Optional[str] = None

class EditRequest(BaseModel):
    image_url: str
    prompt: str
    seed: Optional[int] = 42

AUTH_TOKEN = os.getenv("AUTH_BEARER", "")
app = FastAPI()
store = StorageClient()

def require_auth(authorization: Optional[str]):
    if not AUTH_TOKEN:
        return
    if not authorization or not authorization.startswith("Bearer "):
        raise HTTPException(status_code=401, detail="Missing bearer token")
    token = authorization.split(" ", 1)[1].strip()
    if token != AUTH_TOKEN:
        raise HTTPException(status_code=403, detail="Invalid token")

def dummy_generate(prompt: str, size: str, seed: int) -> bytes:
    # Placeholder: replace with your real model call (Diffusers, etc.)
    # Here we just make a gray canvas with the seed stamped in pixels.
    w, h = map(int, size.lower().split("x"))
    img = Image.new("RGB", (w, h), (240, 240, 240))
    return _png_bytes(img)

def _png_bytes(pil_img: Image.Image) -> bytes:
    buf = io.BytesIO()
    pil_img.save(buf, format="PNG")
    return buf.getvalue()

@app.post("/generate")
def generate(req: Request, authorization: Optional[str] = Header(None)):
    require_auth(authorization)
    # If preset name provided, load preset JSON (style, cfg, steps, etc.)
    preset = {}
    if req.preset:
        pth = f"/opt/img-api/presets/{req.preset}.json"
        if os.path.exists(pth):
            with open(pth, "r", encoding="utf-8") as f:
                preset = json.load(f)

    seed = int(req.seed or 42)
    size = req.size or "1024x1024"
    img_bytes = dummy_generate(req.prompt, size, seed)  # replace with real model call

    meta = {
        "prompt": req.prompt,
        "style": req.style,
        "seed": seed,
        "size": size,
        "model": os.getenv("MODEL_NAME"),
        "preset": req.preset,
        "created_utc": datetime.utcnow().isoformat() + "Z"
    }
    urls = store.upload_and_sign(img_bytes, meta, req.style or "default", seed)
    return {"status": "ok", "seed": seed, "style": req.style, **urls}

@app.post("/edit")
def edit(req: EditRequest, authorization: Optional[str] = Header(None)):
    require_auth(authorization)
    # TODO: fetch existing image, apply variation/inpaint via model call
    seed = int(req.seed or 42)
    img_bytes = dummy_generate(req.prompt + " (edit)", "1024x1024", seed)
    meta = {
        "source": req.image_url, "prompt": req.prompt, "seed": seed,
        "model": os.getenv("MODEL_NAME"), "created_utc": datetime.utcnow().isoformat() + "Z"
    }
    urls = store.upload_and_sign(img_bytes, meta, "edit", seed)
    return {"status": "ok", "seed": seed, **urls}

A tiny uvicorn entrypoint (/opt/img-api/server/main.py):

import os
import uvicorn

if __name__ == "__main__":
    host = os.getenv("BIND_ADDR", "127.0.0.1")
    port = int(os.getenv("BIND_PORT", "8000"))
    uvicorn.run("app:app", host=host, port=port, reload=False)

Why localhost? We’ll put HTTPS in front and keep the app private.


Step 5: Dockerize the service

/opt/img-api/server/requirements.txt

fastapi==0.115.0
uvicorn==0.30.6
pydantic==2.9.0
pillow==10.4.0
# add google-cloud-storage or boto3 when wiring real storage

/opt/img-api/server/Dockerfile

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt

COPY app.py main.py /app/

# For GPU access inside the container, run with --gpus all (host must have drivers)
ENV PYTHONUNBUFFERED=1
CMD ["python", "main.py"]

/opt/img-api/docker-compose.yml

services:
  api:
    build:
      context: ./server
    env_file:
      - .env
    volumes:
      - ./models:/models
      - ./outputs:/outputs
      - ./keys:/opt/img-api/keys:ro
      - ./presets:/opt/img-api/presets:ro
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: ["gpu"]
    # Bind to localhost only; LB will front this
    ports:
      - "127.0.0.1:8000:8000"
    command: ["python", "main.py"]

Start it:

cd /opt/img-api
docker compose up -d --build

Health check (from the VM):

curl -s http://127.0.0.1:8000/docs | head -n 5

Step 6: Connect storage

Return two links per job:

Hook & format breakdown

Step 7: Add simple security

This is enough for a small team. Later you can add per-user keys or OAuth if needed.


Step 8: Put HTTPS in front

Create a small HTTPS load balancer (managed certificate) for api.yourdomain.com:

Now your API lives at https://api.yourdomain.com.


Step 9: Test the endpoints

Generate

POST https://api.yourdomain.com/generate
Headers:
  Authorization: Bearer YOUR_SUPER_SECRET_TOKEN
Body (JSON):
{
  "prompt": "Flat illustration of the 50/30/20 budgeting rule, three labeled boxes and arrows, brand colors, white background",
  "style": "clean_infographic",
  "seed": 42,
  "size": "1024x1024"
}

Expected response:

{
  "status": "ok",
  "image_url": "https://signed.link/yourimage.png",
  "meta_url": "https://signed.link/yourimage.json",
  "seed": 42,
  "style": "clean_infographic"
}

Edit (optional)

POST https://api.yourdomain.com/edit
Headers:
  Authorization: Bearer YOUR_SUPER_SECRET_TOKEN
Body:
{
  "image_url": "https://...existing.png",
  "prompt": "same layout, switch palette to warm oranges, thicker labels",
  "seed": 42
}

If you get a valid image back, congrats—you have a working factory.


Make it feel like a product

Presets

Create JSON presets under /opt/img-api/presets/ (the API can accept "preset": "clean_infographic").

/opt/img-api/presets/clean_infographic.json

{
  "style": "clean_infographic",
  "palette": ["#0A3", "#111", "#FFD94D"],
  "guidance": 7.5,
  "steps": 30,
  "negative": "no watermark, no extra text, no logos"
}

Style library

Keep a small card per style (one-liner, colors, seed range). Use consistent seeds for series to keep a recognizable look across playlists.

House prompts (copy/paste)

Add a negative prompt: no watermark, no extra text, no logos.

Hook & format breakdown

Logs, monitoring, and cost sanity

If traffic grows, add a lightweight job queue (Redis / managed queue) and one worker per GPU.


Scaling up gently

Even if you never scale, these steps make disaster recovery painless.


Daily runbook (what your team actually does)

  1. Morning: power the VM on (or leave it running on busy weeks).
  2. Batch: send your prompt list (titles → thumbnail prompts → diagrams).
  3. Review: pick winners fast—don’t nitpick.
  4. Tag: favorite images get _final.png and a “used-in-vid” note in metadata.
  5. Off-hours: shut the VM down to save cost (unless your team is global).

Prompts that work for faceless channels

Thumbnails
Close-up of [subject], centered, dramatic lighting, simple background, big empty space for title, high contrast, bold complementary colors, editorial photo style, no clutter, 4k

Title cards / diagrams
Flat illustration of [concept] with 3 labeled boxes and arrows, minimal palette (brand colors), large legible typography, white background, infographic style, no extra text

Cutaways
Abstract motion background, soft bokeh in [brand colors], gentle gradient, loop-friendly, clean and non-distracting

Negative prompt (when supported): no watermark, no extra text, no logos


Troubleshooting (quick fixes)

Hook & format breakdown

Safety and rights


What you can do next


Appendix A — Example client calls

cURL (generate)

curl -s -X POST "https://api.yourdomain.com/generate"   -H "Authorization: Bearer YOUR_SUPER_SECRET_TOKEN"   -H "Content-Type: application/json"   -d '{
    "prompt": "Flat illustration of the 50/30/20 budgeting rule, three labeled boxes and arrows, brand colors, white background",
    "style": "clean_infographic",
    "seed": 1234,
    "size": "1024x1024",
    "preset": "clean_infographic"
  }'

Node snippet

const res = await fetch("https://api.yourdomain.com/generate", {
  method: "POST",
  headers: {
    "Authorization": "Bearer " + process.env.IMG_API_TOKEN,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    prompt: "Cinematic close-up of a vintage compass on dark map, dramatic rim light, room for title",
    style: "cinematic_thumb",
    seed: 777,
    size: "1280x720",
    preset: "cinematic_thumb"
  })
});
const data = await res.json();
console.log(data.image_url);

Appendix B — .env template

MODEL_NAME=Qwen-Image
BUCKET_BACKEND=gcs
BUCKET_NAME=reactivid-images-prod
REGION=asia-southeast1
AUTH_BEARER=REPLACE_ME
SERVICE_ACCOUNT_JSON=/opt/img-api/keys/sa.json
BIND_ADDR=127.0.0.1
BIND_PORT=8000

.gitignore (keep secrets out of git):

/keys/*
.env
outputs/*

Summary

You stood up a single GPU VM, ran a tiny FastAPI service with an image model, saved results to a bucket, and locked it behind HTTPS with a secret token. From your app (or a small form), you can now ask for visuals on demand—thumbnails, diagrams, and cutaways—without paying per image or juggling web UIs. Keep presets tight, reuse seeds, and batch your work. That’s how you get unlimited visuals that look like your brand, on your schedule.