设置构建目录

首先，创建用于构建 Docker 镜像的目录并下载模型权重：

mkdir youtu-parsing-docker && cd youtu-parsing-docker

# 下载模型权重
git lfs install
git clone https://huggingface.co/tencent/Youtu-Parsing

创建 Parsing 服务器脚本

创建名为 youtu_parsing_server.py 的文件，包含服务器实现。您可以从本地部署指南复制此内容或使用以下内容创建：

import base64
import io
import os
import json
import asyncio
from contextlib import asynccontextmanager
from typing import Dict, List
from PIL import Image
from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse
from pydantic import BaseModel

from youtu_hf_parser import YoutuOCRParserHF as youtu


# Global parser instance
parser = None
# Lock for serial request processing
processing_lock = None


class ImageRequest(BaseModel):
    """Request model for receiving base64 encoded image data"""
    image: str


@asynccontextmanager
async def lifespan(app: FastAPI):
    """Application lifecycle management, initializes parser and lock"""
    global parser, processing_lock

    # Get parameters from global configuration
    model_path = os.getenv("YOUTU_PARSING_MODEL_PATH", "/app/Youtu-Parsing")
    enable_angle_correct = (os.getenv("YOUTU_ENABLE_ANGLE_CORRECT", "true") == "true")

    # Initialize parser
    print(f"Initializing the parser with model_path={model_path} and enable_angle_correct={enable_angle_correct}...")
    parser = youtu(model_path, enable_angle_correct=enable_angle_correct)
    processing_lock = asyncio.Lock()

    print("Parser initialized and ready")
    yield

    # Clean up resources
    print("Shutting down server")


# Create FastAPI application
app = FastAPI(title="Youtu OCR Parser Server",
              description="Server for parsing images with Youtu OCR Parser",
              lifespan=lifespan)


def base64_to_image(base64_str: str) -> Image.Image:
    """
    Convert base64 string to PIL Image object
    
    Args:
        base64_str: Base64 encoded image string
        
    Returns:
        PIL Image object
    """
    # Remove possible data URI prefix (e.g., data:image/png;base64,)
    if "," in base64_str:
        base64_str = base64_str.split(",")[1]

    # Decode base64 data
    image_data = base64.b64decode(base64_str)

    # Convert to PIL Image
    image = Image.open(io.BytesIO(image_data))

    return image.convert("RGB")


@app.post("/parse")
@app.post("/")
async def parse_image(request: ImageRequest) -> JSONResponse:
    """
    Parse image and return result
    
    This endpoint receives base64 encoded image data, processes it using parser._parse_single_image,
    and returns the parsing result. Ensures only one image is processed at a time.
    
    Args:
        request: Request containing base64 encoded image
        
    Returns:
        JSONResponse: Response containing parsing result
    """
    # Acquire lock to ensure only one request is processed at a time
    async with processing_lock:
        try:
            # Convert base64 string to PIL Image
            image = base64_to_image(request.image)

            # Call parser._parse_single_image method
            result, _, _ = await asyncio.to_thread(parser._parse_single_image, image)

            # Return result
            return JSONResponse(content=result)

        except Exception as e:
            return JSONResponse(status_code=500,
                                content={
                                    "status": "error",
                                    "message": str(e)
                                })


@app.get("/health")
async def health_check() -> JSONResponse:
    """健康检查端点"""
    return JSONResponse(content={
        "status": "healthy",
        "parser_initialized": parser is not None
    })


if __name__ == "__main__":
    import uvicorn
    import argparse

    # Create command line argument parser
    arg_parser = argparse.ArgumentParser(description="Youtu OCR Parser Server")

    # Add configuration parameters
    arg_parser.add_argument("--host",
                            type=str,
                            default="0.0.0.0",
                            help="Server host address to listen on (default: 0.0.0.0)")

    arg_parser.add_argument("--port",
                            type=int,
                            default=8501,
                            help="Server port to listen on (default: 8501)")

    arg_parser.add_argument("--model_path",
                            type=str,
                            default="/app/Youtu-Parsing",
                            help="Youtu model path (default: /app/Youtu-Parsing)")

    arg_parser.add_argument("--enable_angle_correct",
                            action="store_true",
                            default=True,
                            help="Enable angle correction (default: True)")

    arg_parser.add_argument("--workers",
                            type=int,
                            default=1,
                            help="Number of worker processes (default: 1)")

    # Parse command line arguments
    args = arg_parser.parse_args()

    # Set global configuration
    os.environ["YOUTU_PARSING_MODEL_PATH"] = args.model_path
    os.environ["YOUTU_ENABLE_ANGLE_CORRECT"] = str(args.enable_angle_correct).lower()

    # Output configuration information
    print(f"Configuration:")
    print(f"  Host: {args.host}")
    print(f"  Port: {args.port}")
    print(f"  Model Path: {args.model_path}")
    print(f"  Angle Correction: {'Enabled' if args.enable_angle_correct else 'Disabled'}")
    print(f"  Worker Processes: {args.workers}")
    print("")

    # Start server
    uvicorn.run("youtu_parsing_server:app",
                host=args.host,
                port=args.port,
                reload=False,
                workers=args.workers)

创建 Dockerfile

创建名为 Dockerfile 的文件，内容如下：

# Use official PyTorch image with CUDA 12 support
FROM pytorch/pytorch:2.6.0-cuda12.6-cudnn9-devel

# Set working directory
WORKDIR /app

# Set environment variables to prevent interactive prompts during build
ENV DEBIAN_FRONTEND=noninteractive

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    curl \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install youtu-parsing from git
RUN pip install --no-cache-dir git+https://github.com/TencentCloudADP/youtu-parsing.git#subdirectory=youtu_hf_parser

# Install Flash Attention V2 for CUDA 12.x + Python 3.10 + PyTorch 2.6 + Linux x86_64
RUN pip install --no-cache-dir https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

# Install additional dependencies for the server
RUN pip install --no-cache-dir \
    fastapi \
    uvicorn \
    pydantic \
    Pillow

# Copy the parsing server script
COPY youtu_parsing_server.py /app/youtu_parsing_server.py

# Copy the model weights into the container
COPY Youtu-Parsing/ /app/Youtu-Parsing/

# Set environment variables
ENV YOUTU_PARSING_MODEL_PATH=/app/Youtu-Parsing
ENV YOUTU_ENABLE_ANGLE_CORRECT=true
ENV PORT=8501

# Expose the server port
EXPOSE 8501

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
    CMD curl -f http://localhost:8501/health || exit 1

# Set the entrypoint
ENTRYPOINT ["python", "/app/youtu_parsing_server.py"]

# Default command arguments
CMD ["--model_path", "/app/Youtu-Parsing", "--port", "8501", "--host", "0.0.0.0", "--enable_angle_correct"]

构建 Docker 镜像

使用以下命令构建 Docker 镜像：

docker build -t youtu-parsing:latest .

此过程可能需要几分钟，因为它会下载基础镜像、安装依赖项（包括 Flash Attention）并复制模型权重。

运行 Docker 容器

使用 GPU 支持运行容器：

docker run --gpus all -p 8501:8501 youtu-parsing:latest

您可以自定义服务器参数：

docker run --gpus all -p 8501:8501 youtu-parsing:latest \
    --model_path /app/Youtu-Parsing \
    --port 8501 \
    --host 0.0.0.0 \
    --enable_angle_correct \
    --workers 1

要禁用角度校正：

docker run --gpus all -p 8501:8501 \
    -e YOUTU_ENABLE_ANGLE_CORRECT=false \
    youtu-parsing:latest

推送到容器注册表

要在远程机器上部署，请将镜像推送到容器注册表：

# 为您的注册表标记镜像
docker tag youtu-parsing:latest your-registry.com/youtu-parsing:latest

# 推送到注册表
docker push your-registry.com/youtu-parsing:latest

在远程机器上运行

在远程机器上，拉取并运行镜像：

# 拉取镜像
docker pull your-registry.com/youtu-parsing:latest

# 使用 GPU 支持运行容器
docker run --gpus all -p 8501:8501 your-registry.com/youtu-parsing:latest

Youtu Parsing 服务将在 http://<remote-machine-ip>:8501 可用。

API 端点

运行后，以下端点可用：

端点	方法	描述
`/parse` or `/`	POST	解析 base64 编码的图像
`/health`	GET	健康检查端点

使用示例

要解析图像，请发送带有 base64 编码图像的 POST 请求：

# 将图像编码为 base64 并发送请求
IMAGE_BASE64=$(base64 -i your_image.png)
curl -X POST http://localhost:8501/parse \
    -H "Content-Type: application/json" \
    -d "{\"image\": \"$IMAGE_BASE64\"}"

最终目录结构

构建前，您的 youtu-parsing-docker 目录应具有以下结构：

youtu-parsing-docker/
├── Dockerfile
├── youtu_parsing_server.py
└── Youtu-Parsing/
    ├── assets/
    ├── __init__.py
    ├── chat_template.json
    ├── config.json
    ├── configuration_siglip2.py
    ├── configuration_youtu_vl.py
    ├── generation_config.json
    ├── image_processing_siglip2_fast.py
    ├── model-00001-of-00002.safetensors
    ├── model-00002-of-00002.safetensors
    ├── model.safetensors.index.json
    ├── modeling_siglip2.py
    ├── modeling_youtu_vl.py
    ├── preprocessor_config.json
    ├── processing_youtu_vl.py
    ├── special_tokens_map.json
    ├── tokenizer.json
    └── tokenizer_config.json

使用 Docker 部署