I recently ran across a problem where one of the Scaleway servers I rent no longer booted because it could not mount
/dev/nbd1 after a reboot:
The canonical solution to problems like these in the age of the cloud is to delete the machine and to provision a new one. But, as this particular server wasn’t provisioned automatically, I wanted to save its configuration and data. Unfortunately though, I never set a root password so I could not use the console to log in.
To get the machine to boot again, I decided to use Scaleway’s rescue boot image. However, this was not as trivial as I thought it would be. Here is the process I went through:
I first noted the boot script the server was currently used it and then changed it to the rescue variant (the actual name might change over time):
I then rebooted the server using the Off button at the top of the settings screen and choosing the Hard Reboot option. After the server has booted, I was greeted by a login prompt:
I was now able to use ssh to connect to the rescue console (184.108.40.206 being the fictional public IP address of my server):
macbook:~ user$ ssh -o StrictHostKeyChecking=no firstname.lastname@example.org root@scw01:~# mount /dev/nbd0 /mnt mount: /dev/nbd0: can't read superblock
This was not what I expected. The reason I got this error is because the rescue image doesn’t boot with the network block devices attached.
It appeared that the required
nbd-client tool wasn’t installed, so I installed it myself using apt-get:
root@scw01:~# nbd-client -bash: nbd-client: command not found root@scw01:~# apt-get update ... Fetched 15.5 MB in 10s (1,482 kB/s) Reading package lists... Done root@scw01:~# apt-get install nbd-client ...
Going back to the server settings page, I noted the IP address and port of the
I now had everything I needed to attach and mount the image myself:
root@scw01:~# nbd-client 10.6.1.1 4100 /dev/nbd0 root@scw01:~# mount /dev/nbd0 /mnt root@scw01:~# ls /mnt bin boot dev etc home initrd.img lib lib64 lost+found media mnt proc root run sbin srv sys tmp usr var vmlinuz
Success! I could finally fix whatever was broken to get my server booting again. Not really relevant to this item, but I more or less had to do this.
When I was finished, I changed the boot script back to what it was. I then unmounted and detached the network block device and rebooted the machine and all was well again:
root@scw01:~# umount /mnt root@scw01:~# nbd-client -d /dev/nbd0 disconnect, sock, done root@scw01:~# reboot
Conclusion: The rescue image is a viable method to recover an instance (or its data). And while I can understand that Scaleway doesn’t automatically attach the network block devices in rescue mode, it would have been nice if they included
nbd-client on the rescue image and documented the procedure.