Pool offline after server failure - how to rectify?

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Try zpool import -fm.
If it doesn't work zpool import -m -c cachefile.

After this I am out of ideas, the -m parameter should have been made for this.
 
Last edited:

swinster

Dabbler
Joined
Oct 10, 2022
Messages
26
I think I figured it out from this post - https://www.truenas.com/community/threads/zpool-raidz2-0-offline-log-ssds-unavailable.99216/

I missed the pool name from zpool import. It should have been:

Code:
root@truenas[~]# zpool import -m Pool1
root@truenas[~]# zpool status -v     
  pool: Pool1
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
  scan: scrub repaired 960K in 04:16:01 with 0 errors on Sun Nov 27 04:16:01 2022
config:

        NAME                                            STATE     READ WRITE CKSUM
        Pool1                                           DEGRADED     0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/54ad8b2b-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/54b63620-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/546086c5-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/549656d2-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
        logs
          1254988834239897657                           UNAVAIL      0     0     0  was /dev/gptid/52eab7bc-5264-11ed-8151-000c29f3c3dd

errors: No known data errors


Now I could remove the log drive:

Code:
root@truenas[~]# zpool remove Pool1 gptid/52eab7bc-5264-11ed-8151-000c29f3c3dd
root@truenas[~]# zpool status -v                                             
  pool: Pool1
 state: ONLINE
  scan: scrub repaired 960K in 04:16:01 with 0 errors on Sun Nov 27 04:16:01 2022
config:

        NAME                                            STATE     READ WRITE CKSUM
        Pool1                                           ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/54ad8b2b-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/54b63620-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/546086c5-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0
            gptid/549656d2-5264-11ed-8151-000c29f3c3dd  ONLINE       0     0     0

errors: No known data errors


And it looks like this are back in business :)

1672365681688.png


I will test this out tomorrow. I owe you a beer. Many thanks.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
[mod note: moved to the virtualization subforum. This is basically user error resulting from virtualization. -JG]
 

swinster

Dabbler
Joined
Oct 10, 2022
Messages
26
@jgreco , TBH, I don't think virtualisation had much to do with it. It was a plain and simple user error that was initiated due to a hardware failure. Virtualisation was a mechanism/framework behind the cockup, but I overwrote the slog disk - this may or may not have happened with TrueNAS running on bare metal. Ultimately, the issue was very similar to https://www.truenas.com/community/threads/zpool-raidz2-0-offline-log-ssds-unavailable.99216/, so I guess it's possible to get into this state for reasons that have nothing to do with virtualisation.

For others following up, there was one more thing I needed to do which was reboot the TrueNAS box so that file permissions were recognised, but it seems as if the data is accessible.
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You're welcome to your opinion. Of course it would be possible to overwrite a SLOG disk on a bare metal install, but my read suggests that the error was made significantly more likely due to the use of virtualization, specifically having multiple things in competition for your SLOG device. This is a danger I've warned about in the past.
 

swinster

Dabbler
Joined
Oct 10, 2022
Messages
26
To clarify, I acquired a new(er) server when the previous server died. I moved most of the disks to the new server. However, the SATA header on the new(er) server motherboard was of a different type, so the TrueNAS pool of disks was not moved. During the reinstallation of ESXi (although it could have been any OS), I made the massive mistake of targeting the SLOG disk to house part of the OS file system. I even caught my mistake and thought I backed out of it but obviously didn't.

Of course, when I managed to revive the old server (faulty power distribution board), I re-switch the disks containing the datastore for TrueNAS and the original ZFS pool was still in place, however, the disk for the SLOG drive had been repartitioned.
 
Top