Youtu Parsing
本地部署
以下文档提供将 Youtu Parsing 部署为 Youtu-RAG 后端服务的说明。
环境要求:
- 使用 Python 3.10 的 Conda 环境
- CUDA 版本 12.x
- Linux x86_64 操作系统
安装 Youtu Parsing
首先创建 conda 环境并安装所需的依赖项:
conda create -n youtu-parsing python=3.10 -y
conda activate youtu-parsing
pip install git+https://github.com/TencentCloudADP/youtu-parsing.git#subdirectory=youtu_hf_parser设置 Flash Attention
Flash Attention V2 是 Youtu Parsing 高效运行的必要条件。按照以下说明安装:
# Install Flash Attention V2 for CUDA 12.x + Python 3.10 + PyTorch 2.6 + Linux x86_64
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl注意:Flash Attention 的安装是平台特定的。如果遇到问题,请参阅官方安装指南。
下载 Youtu Parsing 模型权重
从官方仓库下载预训练模型权重:
git lfs install
git clone https://huggingface.co/tencent/Youtu-Parsing安装服务器依赖项
接下来,安装运行 Youtu Parsing 服务器所需的额外依赖项:
pip install fastapi uvicorn pydantic Pillow运行 Youtu Parsing 服务器
将以下代码保存为 youtu_parsing_server.py:
import base64
import io
import os
import json
import asyncio
from contextlib import asynccontextmanager
from typing import Dict, List
from PIL import Image
from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse
from pydantic import BaseModel
from youtu_hf_parser import YoutuOCRParserHF as youtu
# Global parser instance
parser = None
# Lock for serial request processing
processing_lock = None
class ImageRequest(BaseModel):
"""Request model for receiving base64 encoded image data"""
image: str
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Application lifecycle management, initializes parser and lock"""
global parser, processing_lock
# Get parameters from global configuration
model_path = os.getenv("YOUTU_PARSING_MODEL_PATH", "Youtu-Parsing")
enable_angle_correct = (os.getenv("YOUTU_ENABLE_ANGLE_CORRECT", "true") == "true")
# Initialize parser
print(f"Initializing the parser with model_path={model_path} and enable_angle_correct={enable_angle_correct}...")
parser = youtu(model_path, enable_angle_correct=enable_angle_correct)
processing_lock = asyncio.Lock()
print("Parser initialized and ready")
yield
# Clean up resources
print("Shutting down server")
# Create FastAPI application
app = FastAPI(title="Youtu OCR Parser Server",
description="Server for parsing images with Youtu OCR Parser",
lifespan=lifespan)
def base64_to_image(base64_str: str) -> Image.Image:
"""
Convert base64 string to PIL Image object
Args:
base64_str: Base64 encoded image string
Returns:
PIL Image object
"""
# Remove possible data URI prefix (e.g., data:image/png;base64,)
if "," in base64_str:
base64_str = base64_str.split(",")[1]
# Decode base64 data
image_data = base64.b64decode(base64_str)
# Convert to PIL Image
image = Image.open(io.BytesIO(image_data))
return image.convert("RGB")
@app.post("/parse")
@app.post("/")
async def parse_image(request: ImageRequest) -> JSONResponse:
"""
Parse image and return result
This endpoint receives base64 encoded image data, processes it using parser._parse_single_image,
and returns the parsing result. Ensures only one image is processed at a time.
Args:
request: Request containing base64 encoded image
Returns:
JSONResponse: Response containing parsing result
"""
# Acquire lock to ensure only one request is processed at a time
async with processing_lock:
try:
# Convert base64 string to PIL Image
image = base64_to_image(request.image)
# Call parser._parse_single_image method
# Only need the first return value (result), ignore page_angle and hierarchy_json
result, _, _ = await asyncio.to_thread(parser._parse_single_image,
image)
# Return result
return JSONResponse(content=result)
except Exception as e:
# Catch and return error information
# raise e
return JSONResponse(status_code=500,
content={
"status": "error",
"message": str(e)
})
@app.get("/health")
async def health_check() -> JSONResponse:
"""健康检查端点"""
return JSONResponse(content={
"status": "healthy",
"parser_initialized": parser is not None
})
if __name__ == "__main__":
import uvicorn
import argparse
# Create command line argument parser
arg_parser = argparse.ArgumentParser(description="Youtu OCR Parser Server")
# Add configuration parameters
arg_parser.add_argument("--host",
type=str,
default="0.0.0.0",
help="Server host address to listen on (default: 0.0.0.0)")
arg_parser.add_argument("--port",
type=int,
default=8501,
help="Server port to listen on (default: 8501)")
arg_parser.add_argument("--model_path",
type=str,
default="Youtu-Parsing",
help="Youtu model path (default: Youtu-Parsing)")
arg_parser.add_argument("--enable_angle_correct",
action="store_true",
default=True,
help="Enable angle correction (default: True)")
arg_parser.add_argument("--workers",
type=int,
default=1,
help="Number of worker processes (default: 1)")
# Parse command line arguments
args = arg_parser.parse_args()
# Set global configuration
os.environ["YOUTU_PARSING_MODEL_PATH"] = args.model_path
os.environ["YOUTU_ENABLE_ANGLE_CORRECT"] = str(args.enable_angle_correct).lower()
# Output configuration information
print(f"Configuration:")
print(f" Host: {args.host}")
print(f" Port: {args.port}")
print(f" Model Path: {args.model_path}")
print(f" Angle Correction: {'Enabled' if args.enable_angle_correct else 'Disabled'}")
print(f" Worker Processes: {args.workers}")
print("")
# Start server
uvicorn.run("server:app",
host=args.host,
port=args.port,
reload=False,
workers=args.workers)现在您可以使用以下命令运行服务器:
python youtu_parsing_server.py \
--model_path ./Youtu-Parsing \
--enable_angle_correct \
--port 8501 API 端点
运行后,以下端点可用:
| 端点 | 方法 | 描述 |
|---|---|---|
/parse or / | POST | 解析 base64 编码的图像 |
/health | GET | 健康检查端点 |
使用示例
要解析图像,请发送带有 base64 编码图像的 POST 请求:
# 将图像编码为 base64 并发送请求
IMAGE_BASE64=$(base64 -i your_image.png)
curl -X POST http://localhost:8501/parse \
-H "Content-Type: application/json" \
-d "{\"image\": \"$IMAGE_BASE64\"}"检查服务器健康状态:
curl http://localhost:8501/health