ETJava Beta | Java    注册   登录
  • 搜索:
  • 如何在kubernetes环境中共享GPU

    发表于      阅读(1)     博客类别:Crawler     转自:https://www.cnblogs.com/yechen2019/p/18462109
    如有侵权 请联系我们删除  (页面底部联系我们)  

    随着人工智能和大模型的快速发展,云上GPU资源共享变得必要,因为它可以降低硬件成本,提升资源利用效率,并满足模型训练和推理对大规模并行计算的需求。

    在kubernetes内置的资源调度功能中,GPU调度只能根据“核数”进行调度,但是深度学习等算法程序执行过程中,资源占用比较高的是显存,这样就形成了很多的资源浪费。

    目前的GPU资源共享方案有两种。一种是将一个真正的GPU分解为多个虚拟GPU,即vGPU,这样就可以基于vGPU的数量进行调度;另一种是根据GPU的显存进行调度。

    本文将讲述如何安装kubernetes组件实现根据GPU显存调度资源。

    系统信息

    • 系统:centos stream8

    • 内核:4.18.0-490.el8.x86_64

    • 驱动:NVIDIA-Linux-x86_64-470.182.03

    • docker:20.10.24

    • kubernetes版本:1.24.0

    1. 驱动安装

    请登录nvida官网自行安装:https://www.nvidia.com/Download/index.aspx?lang=en-us

    2. docker安装

    请自行安装docker或其他容器运行时,如果使用其他容器运行时,第三步配置请参考NVIDA官网 https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#installation-guide

    注意:官方支持docker、containerd、podman,但本文档只验证过docker的使用,如果使用其他容器运行时,请注意差异性。

    3. NVIDIA Container Toolkit 安装

    1. 设置仓库与GPG Key
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
       && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
    
    1. 开始安装
    sudo dnf clean expire-cache --refresh
    sudo dnf install -y nvidia-container-toolkit
    
    1. 修改docker配置文件添加容器运行时实现
    sudo nvidia-ctk runtime configure --runtime=docker
    
    1. 修改/etc/docker/daemon.json,设置nvidia为默认容器运行时(必需)
    {
        "default-runtime": "nvidia",
        "runtimes": {
            "nvidia": {
                "path": "/usr/bin/nvidia-container-runtime",
                "runtimeArgs": []
            }
        }
    }
    
    1. 重启docker并开始验证是否生效
    sudo systemctl restart docker
    sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
    

    如果返回如下数据,说明配置成功

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
    | N/A   34C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    ​
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    

    4. 安装K8S GPU调度器

    1. 首先执行以下yaml,部署调度器
    # rbac.yaml
    ---
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: gpushare-schd-extender
    rules:
      - apiGroups:
          - ""
        resources:
          - nodes
        verbs:
          - get
          - list
          - watch
      - apiGroups:
          - ""
        resources:
          - events
        verbs:
          - create
          - patch
      - apiGroups:
          - ""
        resources:
          - pods
        verbs:
          - update
          - patch
          - get
          - list
          - watch
      - apiGroups:
          - ""
        resources:
          - bindings
          - pods/binding
        verbs:
          - create
      - apiGroups:
          - ""
        resources:
          - configmaps
        verbs:
          - get
          - list
          - watch
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: gpushare-schd-extender
      namespace: kube-system
    ---
    kind: ClusterRoleBinding
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: gpushare-schd-extender
      namespace: kube-system
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: gpushare-schd-extender
    subjects:
      - kind: ServiceAccount
        name: gpushare-schd-extender
        namespace: kube-system
    ​
    # deployment yaml
    ---
    kind: Deployment
    apiVersion: apps/v1
    metadata:
      name: gpushare-schd-extender
      namespace: kube-system
    spec:
      replicas: 1
      strategy:
        type: Recreate
      selector:
        matchLabels:
          app: gpushare
          component: gpushare-schd-extender
      template:
        metadata:
          labels:
            app: gpushare
            component: gpushare-schd-extender
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
        spec:
          hostNetwork: true
          tolerations:
            - effect: NoSchedule
              operator: Exists
              key: node-role.kubernetes.io/master
            - effect: NoSchedule
              key: node-role.kubernetes.io/control-plane
              operator: Exists
            - effect: NoSchedule
              operator: Exists
              key: node.cloudprovider.kubernetes.io/uninitialized
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          serviceAccount: gpushare-schd-extender
          containers:
            - name: gpushare-schd-extender
              image: registry.cn-hangzhou.aliyuncs.com/acs/k8s-gpushare-schd-extender:1.11-d170d8a
              env:
                - name: LOG_LEVEL
                  value: debug
                - name: PORT
                  value: "12345"
    ​
    # service.yaml
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: gpushare-schd-extender
      namespace: kube-system
      labels:
        app: gpushare
        component: gpushare-schd-extender
    spec:
      type: NodePort
      ports:
        - port: 12345
          name: http
          targetPort: 12345
          nodePort: 32766
      selector:
        # select app=ingress-nginx pods
        app: gpushare
        component: gpushare-schd-extender
    
    1. 在/etc/kubernetes目录下添加调度策略配置文件
    #scheduler-policy-config.yaml
    ---
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    clientConnection:
      kubeconfig: /etc/kubernetes/scheduler.conf
    extenders:
        # 不知道为什么不支持svc的方式调用,必须用nodeport
      - urlPrefix: "http://gpushare-schd-extender.kube-system:12345/gpushare-scheduler"
        filterVerb: filter
        bindVerb: bind
        enableHTTPS: false
        nodeCacheCapable: true
        managedResources:
          - name: aliyun.com/gpu-mem
            ignoredByScheduler: false
        ignorable: false
    

    上面的 http://gpushare-schd-extender.kube-system:12345 注意要替换为你本地部署的{nodeIP}:{gpushare-schd-extender的nodeport端口},否则会访问不到

    查询命令如下:

    kubectl get service gpushare-schd-extender -n kube-system -o jsonpath='{.spec.ports[?(@.name=="http")].nodePort}'
    
    1. 修改kubernetes调度配置 /etc/kubernetes/manifests/kube-scheduler.yaml
    1. 在commond中添加
     - --config=/etc/kubernetes/scheduler-policy-config.yaml
    ​
    2. 添加pod挂载目录
    在volumeMounts:中添加
    - mountPath: /etc/kubernetes/scheduler-policy-config.yaml
      name: scheduler-policy-config
      readOnly: true
    在volumes:中添加
    - hostPath:
          path: /etc/kubernetes/scheduler-policy-config.yaml
          type: FileOrCreate
      name: scheduler-policy-config
    

    注意:这里千万不要改错,否则可能会出现莫名其妙的错误
    示例如下:

    1. 配置rbac及安装device插件
    kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml
    kubectl create -f https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml
    

    5. 在GPU节点上添加标签

    kubectl label node <target_node> gpushare=true
    

    6. 安装kubectl Gpu 插件

    cd /usr/bin/
    wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare
    chmod u+x /usr/bin/kubectl-inspect-gpushare
    

    7. 验证

    1. 使用kubectl查询GPU资源使用情况
    # kubectl inspect gpushare
    NAME                                IPADDRESS     GPU0(Allocated/Total)  GPU Memory(GiB)
    cn-shanghai.i-uf61h64dz1tmlob9hmtb  192.168.0.71  6/15                   6/15
    cn-shanghai.i-uf61h64dz1tmlob9hmtc  192.168.0.70  3/15                   3/15
    ------------------------------------------------------------------------------
    Allocated/Total GPU Memory In Cluster:
    9/30 (30%)
    
    1. 创建一个有GPU需求的资源,查看其资源调度情况
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: binpack-1
      labels:
        app: binpack-1
    spec:
      replicas: 1
      selector: # define how the deployment finds the pods it manages
        matchLabels:
          app: binpack-1
      template: # define the pods specifications
        metadata:
          labels:
            app: binpack-1
        spec:
          tolerations:
            - effect: NoSchedule
              key: cloudClusterNo
              operator: Exists        
          containers:
            - name: binpack-1
              image: cheyang/gpu-player:v2
              resources:
                limits:
                  # 单位GiB
                  aliyun.com/gpu-mem: 3
    
    

    8. 问题排查

    如果在安装过程中发现资源未安装成功,可以通过pod查看日志

    kubectl get po -n kube-system -o=wide | grep gpushare-device 
    kubecl logs -n kube-system <pod_name>
    

    参考地址:
    NVIDA官网container-toolkit安装文档: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
    阿里云GPU插件安装:https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md