文章

docker引擎断电后启动失败常见问题和解决方案

docker引擎断电后启动失败常见问题和解决方案

docker引擎断电后启动失败常见问题和解决方案

常用排查命令

  • 查看containerd运行状态systemctl status containerd
  • 查看docker引擎运行状态systemctl status docker
  • 在系统日志中查看docker引擎最近日志并持续追踪journalctl -u docker.service -f -n 100

常见问题和解决

容器加载失败问题

使用journalctl命令查看docker日志,提示某个容器加载失败,常见于断电使容器文件系统损坏。

dockerd[26166]: time="xxx" level=error msg="failed to load container" container=xxxxx error="invalid character '\\x00' looking for beginning of value"

解决步骤

1) 根据日志里container=后的容器id,rm /var/lib/docker/containers/<容器id>删除对应容器目录 2) 重启systemctl restart docker

引擎Page expected异常退出问题

使用journalctl命令查看docker日志,发现docker的golang源码抛出panic异常

dockerd[26166]: panic: assertion failed: Page expected to be: 34, but self identifies as xxx

解决步骤

使用玄学删除文件
参考:containerd/issues/3347 Containerd is crashing with panic
参考:Hope will help someone

  • Stop Docker and containerd:
    systemctl stop docker containerd
  • Cleanup containerd data directory (Docker will regenerate it at startup if needed):
    rm -rf /var/lib/containerd/
  • Find Docker’s database files - one of them (most often local-kv.db) corrupted in your system:
    find /var/lib/docker -type f -size -5M -name '*.db' | grep -v overlay2
     - will output something like:
     /var/lib/docker/containerd/daemon/io.containerd.metadata.v1.bolt/meta.db
     /var/lib/docker/volumes/metadata.db
     /var/lib/docker/network/files/local-kv.db
     /var/lib/docker/builder/fscache.db
     /var/lib/docker/buildkit/snapshots.db
     /var/lib/docker/buildkit/metadata.db
     /var/lib/docker/buildkit/cache.db
  • Simply rename this file to .bak:
    mv /var/lib/docker/network/files/local-kv.db{,.bak}
  • Start Docker:
    systemctl start docker
    引擎重启,重启容器验证,由于直接删除了引擎的.db数据文件,直接docker restart xx重启容器会出现关联数据找不到的问题。建议先docker removedocker compose down删除旧容器数据。
本文由作者按照 CC BY 4.0 进行授权