机器学习容器

July 31, 2025 · 6 min read

关键术语

容器 Container- 一种标准化的轻量级软件包，可将应用程序的代码和依赖关系捆绑在一起，以便在任何环境中可靠运行。
容器注册中心（Container Registry）- 用于存储、共享和部署容器镜像的存储库，通常集成到 CI/CD 管道中。例如 Docker Hub、AWS ECR。
容器协调 Container Orchestration- 利用 Kubernetes 和亚马逊 ECS 等平台对容器进行自动管理、扩展和协调。
持续交付- 通过 CI/CD 管道自动化部署，快速可靠地构建、测试和发布容器的软件开发实践。
基础设施即代码（Infrastructure as Code）--通过机器可读的定义文件管理网络、计算、存储等基础设施，而不是手动流程。实现可重现性。
无发行版容器 Distroless Container- 经过优化的容器映像，只包含应用程序、运行时语言和基本系统库，省略了外壳、软件包管理器等。提高安全性。

Containerized Microservices

Containerized Machine

AWS ECR
AWS App Runner
AWS Cloud 9 with machine learning codes

FROM public.ecr.aws/lambda/python:3.8

RUN mkdir -p /app
COPY ./main.py /app/
COPY model/ /app/model/
COPY ./requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
WORKDIR /app
EXPOSE 8080
CMD [ "main.py" ]
ENTRYPOINT [ "python" ]

# pylint: disable=no-name-in-module
# pylint: disable=no-self-argument

from fastapi import FastAPI
import uvicorn
import mlflow
import pandas as pd
from pydantic import BaseModel
from fastapi.responses import JSONResponse
from fastapi.encoders import jsonable_encoder

class Story(BaseModel):
    text: str

def predict(text):
    print(f"Accepted payload: {text}")
    my_data = {
        "text": {0: text},
    }
    data = pd.DataFrame(data=my_data)
    result = loaded_model.predict(pd.DataFrame(data))
    return result


# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model('model')
app = FastAPI()

@app.post("/predict")
async def predict_story(story: Story):
    print(f"predict_story accepted json payload: {story}")
    result = predict(story.text)
    print(f"The result is the following payload: {result}")
    payload = {"FakeNewsTrueFalse": result.tolist()[0]}
    json_compatible_item_data = jsonable_encoder(payload)
    return JSONResponse(content=json_compatible_item_data)

@app.get("/")
async def root():
    return {"message": "Hello Model"}

@app.get("/add/{num1}/{num2}")
async def add(num1: int, num2: int):
    """Add two numbers together"""

    total = num1 + num2
    return {"total": total}


if __name__ == '__main__':
    uvicorn.run(app, port=8080, host='0.0.0.0')

如何做模型部署？

准备模型
- 使用 Databricks 平台训练好模型
- 在网络上下载模型
打包模型到镜像推送到 AWS ECR 库
使用 AWS App Runner 服务部署模型服务
Curl 访问模型服务

如何准备模型？

从网络上下载模型
AutoML 技术训练模型，使用 Databricks, SageMaker, Azure ML Studio
从头构建模型
直接通过 Endpoint 对外服务模型能力

Build Distroless Container

alfredodeza/rust-distroless-azure

多阶段构建技术

FROM rust:1.67.0-buster as builder

WORKDIR /usr/src/app

COPY . .

RUN cargo build --release

# Now copy it into our base image.
FROM gcr.io/distroless/cc-debian10

COPY --from=builder /usr/src/app/target/release/rust-tokenizers-api /usr/local/bin/rust-tokenizers-api
CMD ["rust-tokenizers-api"]

特性	分层构建（Layered Builds）	多阶段构建（Multi-stage Builds）
本质	镜像由多个只读层堆叠而成，每个指令生成一个层。	通过多个 `FROM` 阶段分离构建和运行环境，最终仅保留必要内容。
主要目标	优化存储、构建缓存和传输效率。	减少最终镜像体积，提升安全性（如移除构建工具）。
语法示例	`dockerfile`- FROM alpine- COPY . /app- RUN make	`dockerfile`- FROM golang AS builder- ...- FROM alpine- COPY --from=builder /app/bin
层共享	✅ 所有层共享（包括中间层）	❌ 仅最终镜像的层共享，中间阶段层被丢弃。
典型场景	常规镜像构建	需要极简生产镜像（如结合 Distroless）、移除构建依赖。
优缺点	✅ 构建速度快（缓存机制） ✅ 节省磁盘空间（共享基础层）❌ 镜像可能包含冗余层（如调试工具）❌ 敏感数据可能残留历史层	✅ 最终镜像极小（如仅几 MB）✅ 无编译工具残留，安全性高 ❌ 构建流程更复杂 ❌ 调试困难（需分阶段排查）
镜像体积	较大（含所有层）	极小（仅保留必需文件）
安全性	可能残留敏感数据或工具	无构建工具，适合生产环境
构建速度	快（缓存友好）	较慢（需完整执行多阶段）
适用场景	开发调试、常规服务	生产部署、静态编译语言（Go/Rust）

如果备份中间阶段(分阶段构建)？

方法 1: 在构建时通过 --target 参数指定阶段，并单独构建和保存该阶段的镜像：

# 1. 构建并保存中间阶段镜像（例如 `builder` 阶段）
docker build --target builder -t myapp-builder .

# 2. 运行中间镜像进行检查
docker run -it myapp-builder sh

# 3. 继续构建最终镜像
docker build -t myapp-final .

# 阶段1：构建阶段
FROM golang:1.21 AS builder
COPY . /src
RUN cd /src && go build -o /app/server

# 阶段2：运行阶段
FROM alpine
COPY --from=builder /app/server /
CMD ["/server"]

方法 2：使用 docker buildx 缓存导出（推荐）

利用 docker buildx 的缓存机制，将中间阶段缓存到本地或远程仓库，后续可复用:

# 1. 构建并导出缓存（支持本地目录或远程仓库）
docker buildx build --target builder --cache-to type=local,dest=/tmp/cache .

# 2. 后续构建时复用缓存
docker buildx build --cache-from type=local,src=/tmp/cache .

如果中间镜像未被保存，但构建缓存仍存在，可以通过以下方式恢复:

# 查看构建缓存
docker buildx ls

# 从缓存重新构建中间阶段
docker build --target builder --cache-from <CACHE_ID> .

关键术语​

Containerized Microservices​

Containerized Machine​

Build Distroless Container​

Resources​

关键术语

Containerized Microservices

Containerized Machine

Build Distroless Container

Resources