Skip to content

离线安装nvidia-container-toolkit

关键词:安装nvidia-container-runtime、安装nvidia工具

背景

科普:Nvidia GPU Container 原理

官方:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/1.17.4/install-guide.html

安装

由于机器没有外网权限,通过 github 下载 rpm 的方式来装

下载地址:https://github.com/NVIDIA/nvidia-container-toolkit/releases

bash
# 解压压缩包
tar -xvf nvidia-container-toolkit_1.18.1_rpm_x86_64.tar.gz

# 依赖关系如下:
#├─ nvidia-container-toolkit (version)
#│    ├─ libnvidia-container-tools (>= version)
#│    └─ nvidia-container-toolkit-base (version)
#│
#├─ libnvidia-container-tools (version)
#│    └─ libnvidia-container1 (>= version)
#└─ libnvidia-container1 (version)

cd release-v1.18.1-stable/packages/centos7/x86_64
# 1. 基础库(必须最先安装)
rpm -ivh libnvidia-container1-1.18.1-1.x86_64.rpm

# 2. 工具包(依赖基础库)
rpm -ivh libnvidia-container-tools-1.18.1-1.x86_64.rpm

# 3. libseccomp集成包(需要系统已安装libseccomp):如果报错可以跳过
rpm -ivh libnvidia-container-libseccomp2-1.18.1-1.x86_64.rpm

# 4. 基础工具包
rpm -ivh nvidia-container-toolkit-base-1.18.1-1.x86_64.rpm

# 5. 主工具包
rpm -ivh nvidia-container-toolkit-1.18.1-1.x86_64.rpm

# 6. 操作扩展包
rpm -ivh nvidia-container-toolkit-operator-extensions-1.18.1-1.x86_64.rpm

# 7. 可选:调试和开发包(可在任何时候安装)
rpm -ivh libnvidia-container1-debuginfo-1.18.1-1.x86_64.rpm
rpm -ivh libnvidia-container-devel-1.18.1-1.x86_64.rpm

# 检测结果
which nvidia-container-runtime
# 正常会出现二进制文件位置
# /usr/bin/nvidia-container-runtime

配置

bash
# /etc/docker/daemon.json
{
  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}

# 重新加载 Docker 配置
systemctl daemon-reload
# 重启 Docker
systemctl restart docker