百度智能云全功能AI开发平台BML-在BML平台使用容器镜像服务CCR |
产品推荐: 1、安全稳定的云服务器租用,2核/2G/5M仅37元,点击抢购>>>; 2、高防物理服务器20核/16G/50M/500G防御仅350元,点击抢购>>> 3、百度智能建站(五合一网站)仅880元/年,点击抢购>>> 模板建站(PC+手机站)仅480元/年,点击抢购>>> 4、阿里云服务器2核2G3M仅99元/年、2核4G5M仅199元/年,新老同享,点击抢购>>> 5、腾讯云服务器2核2G4M仅99元/年、新老同享,点击抢购>>> 百度智能云全功能AI开发平台BML-在BML平台使用容器镜像服务CCR 在BML平台使用容器镜像服务CCR平台支持用户在用户资源池上关联容器镜像服务CCR作为资源池的镜像仓库,在使用用户资源池提交任务时,可以使用镜像仓库中的镜像。 当前支持的容器镜像服务CCR类型:
当前支持使用容器镜像服务CCR提交的任务:
前提条件
创建镜像仓库
企业版:支持选择资源池对应区域和VPC下的,归属于主账号的容器镜像服务CCR-企业版的实例,并填写账号密码进行添加。 个人版:支持选择归属于主账号的容器镜像服务CCR-个人版的实例,并填写账号密码进行添加。 使用镜像提交自定义作业任务在算法配置阶段,如果用户选择了用户资源池,即支持选择该资源池所关联的CCR镜像环境提交任务。(训练作业和自动搜索作业任务的提交过程一致)
在选择完镜像后,需要根据镜像中的深度学习框架来选择分布式框架,其中:
后续填写代码文件、启动命令、输出路径等信息后,即可提交自定义作业任务。 附录:自定义镜像规范Paddle镜像
RUN /bin/bash -c 'mkdir /home/bos && \ cd /home/bos && \ wget --no-check-certificate https://sdk.bce.baidu.com/console-sdk/linux-bcecmd-0.3.0.zip && \ unzip linux-bcecmd-0.3.0.zip && \ echo "export.UTF-8" >> ~/.bashrc && \ echo "export PATH="/home/bos/linux-bcecmd-0.3.0:${PATH}"" >> ~/.bashrc && \ source ~/.bashrc' 或者自行在镜像中安装到/home/bos/linux-bcecmd-0.3.0目录下,确保 “/home/bos/linux-bcecmd-0.3.0/bcecmd” 这句命令在命令行能弹出相关bcecmd帮助信息,即命令能被系统识别
ENV NVIDIA_VISIBLE_DEVICES "" Sklearn镜像FROM ubuntu16.04-python3 # Configure time zone RUN apt-get update && \ apt-get install -y tzdata && \ ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \ dpkg-reconfigure -f noninteractive tzdata && \ apt-get clean RUN apt-get install -y --no-install-recommends\ build-essential \ libopencv-dev \ libssl-dev \ dnsutils \ unzip \ vim \ jq \ curl \ wget && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* #py3 RUN bash -c 'cd /tmp && \ wget --no-check-certificate https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh && \ bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p ~/miniconda3 && \ echo "source ~/miniconda3/bin/activate" >> ~/.bashrc && \ echo "export PATH="~/miniconda3/bin:${PATH}"" >> ~/.bashrc && \ source ~/.bashrc && \ rm -rf Miniconda3-py37_4.8.3-Linux-x86_64.sh' && \ /root/miniconda3/bin/python -m pip config set global.index-url https://pypi.douban.com/simple/ && \ /root/miniconda3/bin/python -m pip install --upgrade pip && \ /root/miniconda3/bin/python -m pip install --upgrade setuptools && \ /root/miniconda3/bin/python -m pip install numpy==1.17.4 && \ /root/miniconda3/bin/python -m pip install albumentations==0.4.3 && \ /root/miniconda3/bin/python -m pip install Cython==0.29.16 && \ /root/miniconda3/bin/python -m pip install pycocotools==2.0.0 && \ /root/miniconda3/bin/python -m pip install ruamel.yaml && \ /root/miniconda3/bin/python -m pip install ujson && \ /root/miniconda3/bin/python -m pip install scipy==1.5.3 && \ /root/miniconda3/bin/python -m pip install scikit-learn==0.23.2 && \ /root/miniconda3/bin/python -m pip install pandas RUN /root/miniconda3/condabin/conda clean -p && \ /root/miniconda3/condabin/conda clean -t RUN rm -rf ~/.cache/pip RUN rm -rf /usr/bin/python3 && ln -s /root/miniconda3/bin/python /usr/bin/python3 #sklearn RUN /root/miniconda3/bin/python -m pip install xgboost==1.3.1 #安装bos RUN /bin/bash -c 'mkdir /home/bos && \ cd /home/bos && \ wget --no-check-certificate https://sdk.bce.baidu.com/console-sdk/linux-bcecmd-0.3.0.zip && \ unzip linux-bcecmd-0.3.0.zip && \ echo "export.UTF-8" >> ~/.bashrc && \ echo "export PATH="/home/bos/linux-bcecmd-0.3.0:${PATH}"" >> ~/.bashrc && \ source ~/.bashrc' #添加搜索作业SDK install 3.20.1 protobuf for searchjob COPY rudder-autosearch-1.0.0-py3-none-any.whl /home/rudder-autosearch-1.0.0-py3-none-any.whl RUN pip install /home/rudder-autosearch-1.0.0-py3-none-any.whl && \ pip install protobuf==3.20.1 ENV NVIDIA_VISIBLE_DEVICES "" ENTRYPOINT ["/bin/bash"] 如果是pytorch/tensorflow的多机分布式作业,则需要安装额外的依赖包
#安装openmpi RUN mkdir /tmp/openmpi && \ cd /tmp/openmpi && \ wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.0.tar.gz && \ tar zxf openmpi-4.0.0.tar.gz && \ cd openmpi-4.0.0 && \ ./configure --enable-orterun-prefix-by-default && \ make -j $(nproc) all && \ make install && \ ldconfig && \ rm -rf /tmp/openmpi # Create a wrapper for OpenMPI to allow running as root by default RUN mv /usr/local/bin/mpirun /usr/local/bin/mpirun.real && \ echo '#!/bin/bash' > /usr/local/bin/mpirun && \ echo 'mpirun.real --allow-run-as-root "$@"' >> /usr/local/bin/mpirun && \ chmod a+x /usr/local/bin/mpirun # Configure OpenMPI to run good defaults: # --bind-to none --map-by slot --mca btl_tcp_if_exclude lo,docker0 RUN echo "hwloc_base_binding_policy = none" >> /usr/local/etc/openmpi-mca-params.conf && \ echo "rmaps_base_mapping_policy = slot" >> /usr/local/etc/openmpi-mca-params.conf && \ echo "btl_tcp_if_exclude = lo,docker0" >> /usr/local/etc/openmpi-mca-params.conf
RUN ldconfig /usr/local/cuda/targets/x86_64-linux/lib/stubs && \ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod && \ ldconfig
# Set default NCCL parameters RUN echo NCCL_DEBUG=INFO >> /etc/nccl.conf
# Install OpenSSH for MPI to communicate between containers RUN apt-get install -y --no-install-recommends openssh-client openssh-server && \ mkdir -p /var/run/sshd # Allow OpenSSH to talk to containers without asking for confirmation RUN cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new && \ echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new && \ mv /etc/ssh/ssh_config.new /etc/ssh/ssh_config pytorch/tf 镜像示例FROM cuda11.0.3-cudnn8-devel-ubuntu18.04 # Configure time zone RUN rm /etc/apt/sources.list.d/cuda.list && rm /etc/apt/sources.list.d/nvidia-ml.list RUN apt-get update && \ DEBIAN_FRONTEND="noninteractive" TZ="Asia/Shanghai" \ apt-get install -y tzdata && \ ln -snf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \ dpkg-reconfigure -f noninteractive tzdata && \ apt-get clean #cmake需要大于3.13版本 RUN apt-get install -y \ build-essential \ libopencv-dev \ libssl-dev \ dnsutils \ unzip \ vim \ git \ jq \ curl \ cmake \ wget \ ca-certificates \ libjpeg-dev \ libpng-dev \ wget && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* #py3 RUN bash -c 'cd /tmp && \ wget --no-check-certificate https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh && \ bash Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -p ~/miniconda3 && \ echo "source ~/miniconda3/bin/activate" >> ~/.bashrc && \ echo "export PATH="~/miniconda3/bin:${PATH}"" >> ~/.bashrc && \ source ~/.bashrc && \ rm -rf Miniconda3-py37_4.8.3-Linux-x86_64.sh' && \ /root/miniconda3/bin/python -m pip config set global.index-url https://pypi.douban.com/simple/ && \ /root/miniconda3/bin/python -m pip install --upgrade pip && \ /root/miniconda3/bin/python -m pip install --upgrade setuptools && \ /root/miniconda3/bin/python -m pip install numpy==1.17.4 && \ /root/miniconda3/bin/python -m pip install albumentations==0.4.3 && \ /root/miniconda3/bin/python -m pip install Cython==0.29.16 && \ /root/miniconda3/bin/python -m pip install pycocotools==2.0.0 && \ /root/miniconda3/bin/python -m pip install ruamel.yaml && \ /root/miniconda3/bin/python -m pip install ujson && \ /root/miniconda3/bin/python -m pip install scikit-learn==0.23.2 && \ /root/miniconda3/bin/python -m pip install pandas RUN /root/miniconda3/condabin/conda clean -p && \ /root/miniconda3/condabin/conda clean -t RUN rm -rf ~/.cache/pip RUN rm -rf /usr/bin/python3 && ln -s /root/miniconda3/bin/python /usr/bin/python3 #torch RUN /root/miniconda3/bin/python -m pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio===0.7.2 \ -f https://download.pytorch.org/whl/torch_stable.html # Install Open MPI 4.0.0 RUN mkdir /tmp/openmpi && \ cd /tmp/openmpi && \ wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.0.tar.gz && \ tar zxf openmpi-4.0.0.tar.gz && \ cd openmpi-4.0.0 && \ ./configure --enable-orterun-prefix-by-default && \ make -j $(nproc) all && \ make install && \ ldconfig && \ rm -rf /tmp/openmpi # Install Horovod, temporarily using CUDA stubs # /usr/local/cuda links to /usr/local/cuda-10.1 RUN ldconfig /usr/local/cuda/targets/x86_64-linux/lib/stubs && \ HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod && \ ldconfig # Create a wrapper for OpenMPI to allow running as root by default RUN mv /usr/local/bin/mpirun /usr/local/bin/mpirun.real && \ echo '#!/bin/bash' > /usr/local/bin/mpirun && \ echo 'mpirun.real --allow-run-as-root "$@"' >> /usr/local/bin/mpirun && \ chmod a+x /usr/local/bin/mpirun # Configure OpenMPI to run good defaults: # --bind-to none --map-by slot --mca btl_tcp_if_exclude lo,docker0 RUN echo "hwloc_base_binding_policy = none" >> /usr/local/etc/openmpi-mca-params.conf && \ echo "rmaps_base_mapping_policy = slot" >> /usr/local/etc/openmpi-mca-params.conf && \ echo "btl_tcp_if_exclude = lo,docker0" >> /usr/local/etc/openmpi-mca-params.conf # Set default NCCL parameters RUN echo NCCL_DEBUG=INFO >> /etc/nccl.conf # Install OpenSSH for MPI to communicate between containers RUN apt-get install -y --no-install-recommends openssh-client openssh-server && \ mkdir -p /var/run/sshd # Allow OpenSSH to talk to containers without asking for confirmation RUN cat /etc/ssh/ssh_config | grep -v StrictHostKeyChecking > /etc/ssh/ssh_config.new && \ echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config.new && \ mv /etc/ssh/ssh_config.new /etc/ssh/ssh_config #安装bos RUN /bin/bash -c 'mkdir /home/bos && \ cd /home/bos && \ wget --no-check-certificate https://sdk.bce.baidu.com/console-sdk/linux-bcecmd-0.3.0.zip && \ unzip linux-bcecmd-0.3.0.zip && \ echo "export.UTF-8" >> ~/.bashrc && \ echo "export PATH="/home/bos/linux-bcecmd-0.3.0:${PATH}"" >> ~/.bashrc && \ source ~/.bashrc' #添加搜索作业SDK install 3.20.1 protobuf for searchjob COPY rudder-autosearch-1.0.0-py3-none-any.whl /home/rudder-autosearch-1.0.0-py3-none-any.whl RUN pip install /home/rudder-autosearch-1.0.0-py3-none-any.whl && \ pip install protobuf==3.20.1 ENV NVIDIA_VISIBLE_DEVICES "" ENTRYPOINT ["/bin/bash"] |