Youtu-RAG
Youtu HiChunk

Deploying with Docker

This document provides instructions for building a self-contained Docker image for Youtu HiChunk that includes all model weights and dependencies.

Requirements:

  • Docker installed on your system
  • Sufficient disk space (~10GB for the image)
  • NVIDIA GPU with CUDA 12.x support (for running the container)

Setting Up the Build Directory

First, create a directory for building the Docker image and download the model weights:

mkdir hichunk-docker && cd hichunk-docker

# Download the model weights
git lfs install
git clone https://huggingface.co/tencent/Youtu-HiChunk

Creating the Custom vLLM Model Files

Youtu HiChunk requires custom model files to be registered with vLLM. Create the following files in your build directory.

utu_v1.py

Create a file named utu_v1.py with the Youtu HiChunk model implementation. You can copy this file from the downloaded Youtu-HiChunk directory or from the local deployment guide.

# Copy from the downloaded model directory
cp Youtu-HiChunk/utu_v1.py ./utu_v1.py

registry.py

Create a file named registry.py with the updated vLLM model registry. You can copy this file from the downloaded HiChunk directory or from the local deployment guide.

# Copy from the downloaded model directory
cp Youtu-HiChunk/registry.py ./registry.py

Creating the Dockerfile

Create a file named Dockerfile with the following content:

# Use the official vLLM image with CUDA 12.x support
FROM vllm/vllm-openai:v0.9.1

# Set working directory
WORKDIR /app

# Install additional dependencies
RUN pip install --no-cache-dir liger_kernel transformers==4.53.0

# Copy the custom vLLM model files to the correct location
COPY utu_v1.py /usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utu_v1.py
COPY registry.py /usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/registry.py

# Copy the model weights into the container
COPY Youtu-HiChunk/ /app/HiChunk/

# Set environment variables
ENV MODEL_PATH=/app/HiChunk
ENV PORT=8501

# Expose the server port
EXPOSE 8501

# Set the entrypoint to run vLLM serve
ENTRYPOINT ["python", "-m", "vllm.entrypoints.openai.api_server"]

# Default command arguments
CMD ["--model", "/app/HiChunk", \
     "--served-model-name", "HiChunk", \
     "--port", "8501", \
     "--host", "0.0.0.0", \
     "--trust-remote-code", \
     "--dtype", "bfloat16", \
     "--max-num-batched-tokens", "32768", \
     "--enforce-eager", \
     "--seed", "0"]

Building the Docker Image

Build the Docker image with the following command:

docker build -t hichunk:latest .

This process may take several minutes as it downloads the base image and copies the model weights.

Running the Docker Container

Run the container with GPU support:

docker run --gpus all -p 8501:8501 hichunk:latest

You can also customize the server parameters by overriding the CMD:

docker run --gpus all -p 8501:8501 hichunk:latest \
    --model /app/HiChunk \
    --served-model-name HiChunk \
    --port 8501 \
    --host 0.0.0.0 \
    --trust-remote-code \
    --dtype bfloat16 \
    --max-num-batched-tokens 65536 \
    --enforce-eager \
    --seed 0

Pushing to a Container Registry

To deploy on a remote machine, push the image to a container registry:

# Tag the image for your registry
docker tag hichunk:latest your-registry.com/hichunk:latest

# Push to the registry
docker push your-registry.com/hichunk:latest

Running on a Remote Machine

On the remote machine, pull and run the image:

# Pull the image
docker pull your-registry.com/hichunk:latest

# Run the container
docker run --gpus all -p 8501:8501 your-registry.com/hichunk:latest

The Youtu HiChunk service will be available at http://<remote-machine-ip>:8501.

Final Directory Structure

Before building, your hichunk-docker directory should have the following structure:

hichunk-docker/
├── Dockerfile
├── utu_v1.py
├── registry.py
└── HiChunk/
    ├── config.json
    ├── configuration_utu_v1.py
    ├── generation_config.json
    ├── model-00001-of-00002.safetensors
    ├── model-00002-of-00002.safetensors
    ├── model.safetensors.index.json
    ├── modeling_utu_v1.py
    ├── registry.py
    ├── special_tokens_map.json
    ├── tokenizer_config.json
    ├── tokenizer.json
    ├── trainer_state.json
    └── utu_v1.py

On this page