This document provides instructions for building a self-contained Docker image for Youtu Parsing that includes all model weights and dependencies.

Requirements:

Docker installed on your system
Sufficient disk space (~15GB for the image)
NVIDIA GPU with CUDA 12.x support (required for running the container)

Setting Up the Build Directory

First, create a directory for building the Docker image and download the model weights:

mkdir youtu-parsing-docker && cd youtu-parsing-docker

# Download the model weights
git lfs install
git clone https://huggingface.co/tencent/Youtu-Parsing

Creating the Parsing Server Script

Create a file named youtu_parsing_server.py with the server implementation. You can copy this from the local deployment guide or create it with the following content:

import base64
import io
import os
import json
import asyncio
from contextlib import asynccontextmanager
from typing import Dict, List
from PIL import Image
from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse
from pydantic import BaseModel

from youtu_hf_parser import YoutuOCRParserHF as youtu


# Global parser instance
parser = None
# Lock for serial request processing
processing_lock = None


class ImageRequest(BaseModel):
    """Request model for receiving base64 encoded image data"""
    image: str


@asynccontextmanager
async def lifespan(app: FastAPI):
    """Application lifecycle management, initializes parser and lock"""
    global parser, processing_lock

    # Get parameters from global configuration
    model_path = os.getenv("YOUTU_PARSING_MODEL_PATH", "/app/Youtu-Parsing")
    enable_angle_correct = (os.getenv("YOUTU_ENABLE_ANGLE_CORRECT", "true") == "true")

    # Initialize parser
    print(f"Initializing the parser with model_path={model_path} and enable_angle_correct={enable_angle_correct}...")
    parser = youtu(model_path, enable_angle_correct=enable_angle_correct)
    processing_lock = asyncio.Lock()

    print("Parser initialized and ready")
    yield

    # Clean up resources
    print("Shutting down server")


# Create FastAPI application
app = FastAPI(title="Youtu OCR Parser Server",
              description="Server for parsing images with Youtu OCR Parser",
              lifespan=lifespan)


def base64_to_image(base64_str: str) -> Image.Image:
    """
    Convert base64 string to PIL Image object
    
    Args:
        base64_str: Base64 encoded image string
        
    Returns:
        PIL Image object
    """
    # Remove possible data URI prefix (e.g., data:image/png;base64,)
    if "," in base64_str:
        base64_str = base64_str.split(",")[1]

    # Decode base64 data
    image_data = base64.b64decode(base64_str)

    # Convert to PIL Image
    image = Image.open(io.BytesIO(image_data))

    return image.convert("RGB")


@app.post("/parse")
@app.post("/")
async def parse_image(request: ImageRequest) -> JSONResponse:
    """
    Parse image and return result
    
    This endpoint receives base64 encoded image data, processes it using parser._parse_single_image,
    and returns the parsing result. Ensures only one image is processed at a time.
    
    Args:
        request: Request containing base64 encoded image
        
    Returns:
        JSONResponse: Response containing parsing result
    """
    # Acquire lock to ensure only one request is processed at a time
    async with processing_lock:
        try:
            # Convert base64 string to PIL Image
            image = base64_to_image(request.image)

            # Call parser._parse_single_image method
            result, _, _ = await asyncio.to_thread(parser._parse_single_image, image)

            # Return result
            return JSONResponse(content=result)

        except Exception as e:
            return JSONResponse(status_code=500,
                                content={
                                    "status": "error",
                                    "message": str(e)
                                })


@app.get("/health")
async def health_check() -> JSONResponse:
    """Health check endpoint"""
    return JSONResponse(content={
        "status": "healthy",
        "parser_initialized": parser is not None
    })


if __name__ == "__main__":
    import uvicorn
    import argparse

    # Create command line argument parser
    arg_parser = argparse.ArgumentParser(description="Youtu OCR Parser Server")

    # Add configuration parameters
    arg_parser.add_argument("--host",
                            type=str,
                            default="0.0.0.0",
                            help="Server host address to listen on (default: 0.0.0.0)")

    arg_parser.add_argument("--port",
                            type=int,
                            default=8501,
                            help="Server port to listen on (default: 8501)")

    arg_parser.add_argument("--model_path",
                            type=str,
                            default="/app/Youtu-Parsing",
                            help="Youtu model path (default: /app/Youtu-Parsing)")

    arg_parser.add_argument("--enable_angle_correct",
                            action="store_true",
                            default=True,
                            help="Enable angle correction (default: True)")

    arg_parser.add_argument("--workers",
                            type=int,
                            default=1,
                            help="Number of worker processes (default: 1)")

    # Parse command line arguments
    args = arg_parser.parse_args()

    # Set global configuration
    os.environ["YOUTU_PARSING_MODEL_PATH"] = args.model_path
    os.environ["YOUTU_ENABLE_ANGLE_CORRECT"] = str(args.enable_angle_correct).lower()

    # Output configuration information
    print(f"Configuration:")
    print(f"  Host: {args.host}")
    print(f"  Port: {args.port}")
    print(f"  Model Path: {args.model_path}")
    print(f"  Angle Correction: {'Enabled' if args.enable_angle_correct else 'Disabled'}")
    print(f"  Worker Processes: {args.workers}")
    print("")

    # Start server
    uvicorn.run("youtu_parsing_server:app",
                host=args.host,
                port=args.port,
                reload=False,
                workers=args.workers)

Creating the Dockerfile

Create a file named Dockerfile with the following content:

# Use official PyTorch image with CUDA 12 support
FROM pytorch/pytorch:2.6.0-cuda12.6-cudnn9-devel

# Set working directory
WORKDIR /app

# Set environment variables to prevent interactive prompts during build
ENV DEBIAN_FRONTEND=noninteractive

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    curl \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install youtu-parsing from git
RUN pip install --no-cache-dir git+https://github.com/TencentCloudADP/youtu-parsing.git#subdirectory=youtu_hf_parser

# Install Flash Attention V2 for CUDA 12.x + Python 3.10 + PyTorch 2.6 + Linux x86_64
RUN pip install --no-cache-dir https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

# Install additional dependencies for the server
RUN pip install --no-cache-dir \
    fastapi \
    uvicorn \
    pydantic \
    Pillow

# Copy the parsing server script
COPY youtu_parsing_server.py /app/youtu_parsing_server.py

# Copy the model weights into the container
COPY Youtu-Parsing/ /app/Youtu-Parsing/

# Set environment variables
ENV YOUTU_PARSING_MODEL_PATH=/app/Youtu-Parsing
ENV YOUTU_ENABLE_ANGLE_CORRECT=true
ENV PORT=8501

# Expose the server port
EXPOSE 8501

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
    CMD curl -f http://localhost:8501/health || exit 1

# Set the entrypoint
ENTRYPOINT ["python", "/app/youtu_parsing_server.py"]

# Default command arguments
CMD ["--model_path", "/app/Youtu-Parsing", "--port", "8501", "--host", "0.0.0.0", "--enable_angle_correct"]

Building the Docker Image

Build the Docker image with the following command:

docker build -t youtu-parsing:latest .

This process may take several minutes as it downloads the base image, installs dependencies (including Flash Attention), and copies the model weights.

Running the Docker Container

Run the container with GPU support:

docker run --gpus all -p 8501:8501 youtu-parsing:latest

You can customize the server parameters:

docker run --gpus all -p 8501:8501 youtu-parsing:latest \
    --model_path /app/Youtu-Parsing \
    --port 8501 \
    --host 0.0.0.0 \
    --enable_angle_correct \
    --workers 1

To disable angle correction:

docker run --gpus all -p 8501:8501 \
    -e YOUTU_ENABLE_ANGLE_CORRECT=false \
    youtu-parsing:latest

Pushing to a Container Registry

To deploy on a remote machine, push the image to a container registry:

# Tag the image for your registry
docker tag youtu-parsing:latest your-registry.com/youtu-parsing:latest

# Push to the registry
docker push your-registry.com/youtu-parsing:latest

Running on a Remote Machine

On the remote machine, pull and run the image:

# Pull the image
docker pull your-registry.com/youtu-parsing:latest

# Run the container with GPU support
docker run --gpus all -p 8501:8501 your-registry.com/youtu-parsing:latest

The Youtu Parsing service will be available at http://<remote-machine-ip>:8501.

API Endpoints

Once running, the following endpoints are available:

Endpoint	Method	Description
`/parse` or `/`	POST	Parse a base64-encoded image
`/health`	GET	Health check endpoint

Example Usage

To parse an image, send a POST request with a base64-encoded image:

# Encode an image to base64 and send a request
IMAGE_BASE64=$(base64 -i your_image.png)
curl -X POST http://localhost:8501/parse \
    -H "Content-Type: application/json" \
    -d "{\"image\": \"$IMAGE_BASE64\"}"

Final Directory Structure

Before building, your youtu-parsing-docker directory should have the following structure:

youtu-parsing-docker/
├── Dockerfile
├── youtu_parsing_server.py
└── Youtu-Parsing/
    ├── assets/
    ├── __init__.py
    ├── chat_template.json
    ├── config.json
    ├── configuration_siglip2.py
    ├── configuration_youtu_vl.py
    ├── generation_config.json
    ├── image_processing_siglip2_fast.py
    ├── model-00001-of-00002.safetensors
    ├── model-00002-of-00002.safetensors
    ├── model.safetensors.index.json
    ├── modeling_siglip2.py
    ├── modeling_youtu_vl.py
    ├── preprocessor_config.json
    ├── processing_youtu_vl.py
    ├── special_tokens_map.json
    ├── tokenizer.json
    └── tokenizer_config.json

Deploying with Docker