Pool offline after server failure - how to rectify?

Davvo · Dec 29, 2022

Try zpool import -fm.
If it doesn't work zpool import -m -c cachefile.

After this I am out of ideas, the -m parameter should have been made for this.

swinster · Dec 29, 2022

I think I figured it out from this post - https://www.truenas.com/community/threads/zpool-raidz2-0-offline-log-ssds-unavailable.99216/

I missed the pool name from zpool import. It should have been:

Code:

root@truenas[~]# zpool import -m Pool1
root@truenas[~]# zpool status -v     
  pool: Pool1
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
  scan: scrub repaired 960K in 04:16:01 with 0 errors on Sun Nov 27 04:16:01 2022
config:

        NAME                                            STATE     READ WRITE CKSUM
        Pool1                                           DEGRADED     0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/54ad8b2b-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/54b63620-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/546086c5-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/549656d2-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
        logs
          1254988834239897657                           UNAVAIL      0     0     0  was /dev/gptid/52eab7bc-5264-11ed-8151-000c29f3c3dd

errors: No known data errors

Now I could remove the log drive:

Code:

root@truenas[~]# zpool remove Pool1 gptid/52eab7bc-5264-11ed-8151-000c29f3c3dd
root@truenas[~]# zpool status -v                                             
  pool: Pool1
 state: ONLINE
  scan: scrub repaired 960K in 04:16:01 with 0 errors on Sun Nov 27 04:16:01 2022
config:

        NAME                                            STATE     READ WRITE CKSUM
        Pool1                                           ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/54ad8b2b-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/54b63620-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/546086c5-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/549656d2-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0

errors: No known data errors

And it looks like this are back in business :)

I will test this out tomorrow. I owe you a beer. Many thanks.

Davvo · Dec 29, 2022

Congrats! Remember to run a scrub to be safe!

jgreco · Dec 29, 2022

[mod note: moved to the virtualization subforum. This is basically user error resulting from virtualization. -JG]

swinster · Dec 30, 2022

@jgreco , TBH, I don't think virtualisation had much to do with it. It was a plain and simple user error that was initiated due to a hardware failure. Virtualisation was a mechanism/framework behind the cockup, but I overwrote the slog disk - this may or may not have happened with TrueNAS running on bare metal. Ultimately, the issue was very similar to https://www.truenas.com/community/threads/zpool-raidz2-0-offline-log-ssds-unavailable.99216/, so I guess it's possible to get into this state for reasons that have nothing to do with virtualisation.

For others following up, there was one more thing I needed to do which was reboot the TrueNAS box so that file permissions were recognised, but it seems as if the data is accessible.

jgreco · Dec 30, 2022

You're welcome to your opinion. Of course it would be possible to overwrite a SLOG disk on a bare metal install, but my read suggests that the error was made significantly more likely due to the use of virtualization, specifically having multiple things in competition for your SLOG device. This is a danger I've warned about in the past.

swinster · Dec 30, 2022

To clarify, I acquired a new(er) server when the previous server died. I moved most of the disks to the new server. However, the SATA header on the new(er) server motherboard was of a different type, so the TrueNAS pool of disks was not moved. During the reinstallation of ESXi (although it could have been any OS), I made the massive mistake of targeting the SLOG disk to house part of the OS file system. I even caught my mistake and thought I backed out of it but obviously didn't.

Of course, when I managed to revive the old server (faulty power distribution board), I re-switch the disks containing the datastore for TrueNAS and the original ZFS pool was still in place, however, the disk for the SLOG drive had been repartitioned.

Important Announcement for the TrueNAS Community.

Pool offline after server failure - how to rectify?

Davvo

MVP

swinster

Dabbler

Davvo

MVP

jgreco

Resident Grinch

swinster

Dabbler

jgreco

Resident Grinch

swinster

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Pool offline after server failure - how to rectify?

Davvo

MVP

swinster

Dabbler

Davvo

MVP

jgreco

Resident Grinch

swinster

Dabbler

jgreco

Resident Grinch

swinster

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Pool offline after server failure - how to rectify?"

Similar threads