disk pool throwing alert

sretalla · Aug 3, 2023

OK, so data in the pool is good, disks not so much.

At least no CRC errors returned, so just deal with the disks.

WI_Hedgehog · Aug 3, 2023

samarium said:
FYI last time I ran badblocks over an old WD 4TB it took 3 days or so, but gave the desired result SMART 197 Currently Pending Sectors = 0

It'll generally run much faster if you tweak the settings as appropriate for your setup. I went from 3 days to something like 6 hours...I'd look it up but the data is on the jumpdrive I accidentally blew away this past weekend. The point is the default settings aren't usually optimal by a long-shot.

NugentS · Aug 3, 2023

georgelza said:
NugentS

This NAS is my backup... I have some of the documents/photos replicated to Google, but there is still a sh4t load here that I don't want to loose.
Also running a test on media that's going to take a week/2 is simply not viable.
G

Why not?
Answer - you probably don't have enough SATA ports (or drive bays) for another drive do you?
What about running the test on another machine?

I did say remove and replace first - testing can come afterwards

georgelza · Aug 3, 2023

family is on iPad's. I'm on a MBP that I use for work, so it's moved around.
Got replacement drives, first step.
G

georgelza · Aug 3, 2023

sretalla said:
OK, so data in the pool is good, disks not so much.
going to replace this drive with the new
At least no CRC errors returned, so just deal with the disks.

I"m going to do a replacement on this drive that previously had the errors. - thats part of Bunker. and follow with 2nd and a 3rd replacement as the new drives arrive. aka rebuild the entire disk pool, I'm nervous about it. thinking I've just been lucky so far.

G

Davvo · Aug 4, 2023

You certainly weren't unlucky considering both pools are RAIDZ1.

georgelza · Aug 4, 2023

so one drive has been removed... the one with the reported errors on it...
2nd drive that was coming up in alerts is in process of being replaced,
3rd/last drive of the Diskpool will be replaced Monday. together with a new PSU that hs enough direct power points for all drive, thus removing the power cable splitters.

I should have 1 or 2 open SATA ports and think 1, maybe 2 SATA Ports, which I can then use to connect the known problem drive and do a thorough check/test on it, but ye, hopefully not a multi week, so might need some assistance to tweak the test params.
G

georgelza · Aug 4, 2023

for those following.
re bunker diskpool,
replaced 2 out of the 3 hdd's, alert emails have stopped it seems, will replace the 3rd, which will increase the size then.
will then replace the tank pool onto bunker, releasing tanks disks and the original 4TB from bunker. then see how i get mark bad blocks, which disks are good/useable and then create a new RaidZ2 using them, should have 5 wide diskpool if my assumptions are right.

G

georgelza · Aug 7, 2023

can someone point me to a good document on how to relocate dataset from diskpool to diskpool please.

G

georgelza · Aug 7, 2023

WI_Hedgehog said:
It'll generally run much faster if you tweak the settings as appropriate for your setup. I went from 3 days to something like 6 hours...I'd look it up but the data is on the jumpdrive I accidentally blew away this past weekend. The point is the default settings aren't usually optimal by a long-shot.

while we on this, can you advise what to set... never used it before.
once I have the data moved off, going to run it disc per disc, I have 1 open port that I can use for this verification step.
need to check all 6 x 4TB's
G

sretalla · Aug 7, 2023

zfs snapshot -r <old pool>/<dataset>@<snapshot name>
zfs send -R <old pool>/<dataset>@<snapshot name> | zfs recv -Fv <New Pool>/<dataset>

Once you're happy the data is all there and safe;
zfs destroy -f <old pool>/<dataset>

You could also set up a replication task and use it once, then do the delete.

georgelza · Aug 7, 2023

New PSU and last drive is inbound, will be here today, allowing me to change bunker from a 7.7TB pool to approx 14.6TB,
will have the rebuild complete, then move data from tank to bunker, releasing all the 4TB HDD's.
G

WI_Hedgehog · Aug 7, 2023

georgelza said:
while we on this, can you advise what to set... never used it before.
once I have the data moved off, going to run it disc per disc, I have 1 open port that I can use for this verification step.
need to check all 6 x 4TB's
G

I should write a script for that...

badblocks

-b block_size set this to your drive's largest logical block size, generally labeled on the drive. Minimizes drive controller overhead.

-c number of blocks to write at a time, 64 default, generally this is best set between 2048 & 8192 depending on the size of the HBA cache. Minimizes communications overhead and can have a staggering effect on performance.

-e max bad block count, you may want to set this to something reasonable so badblocks aborts on a bad drive, generally 128 is an upper limit though over 15 bad blocks often means a failing drive (dependent on multiple factors).

-p num_passes, I run 3, which is four test patterns run 3x = 12 full-disk writes + 12 full-disk reads = 24 full-disk passes, and that's a lot. Normally 1 run of 4 patterns is enough to get an idea of current drive health.

-svw to see what's going on (this does wipe the drive, a read won't detect future errors)

-o output file, for later analysis if you want.

georgelza · Aug 7, 2023

so, how much cpu does bad blocks eat up.

I have one drive connected/unused already on which i can start bad blocks.

I have a screen/kb avail so was thinking running it in there... allowing me to disconnect

command planned

badblocks -b 4096 <searching for block size for ST4000VN008-2DR166> -c 256 -e 64 -p 1 -svw -o <where ?>

please explain a bit more: -svw to see what's going on (this does wipe the drive, a read won't detect future errors)

G

sretalla · Aug 7, 2023

georgelza said:
I have one drive connected/unused already on which i can start bad blocks.

I have a screen/kb avail so was thinking running it in there... allowing me to disconnect

use tmux from ssh...

tmux new-session -d -s badblocks '/usr/sbin/badblocks -b.....'

Then at any time:

tmux attach -t badblocks

When in that session, CTRL + B, then D to disconnect. Also CTRL + C when in that session to terminate the running code.

georgelza · Aug 7, 2023

haven't used tux before, looked at it enough times to know you were going to say this.
so what do I install on the NAS.

and then will get a tux client for the MAC so long.

G

Davvo · Aug 7, 2023

tmux is already installed on CORE, but you cannot use it in the web shell. You have to connect via ssh.

georgelza · Aug 7, 2023

I"m on SCALE, will try quickly.

got the client installed my side.

while I got a diskpool resilvering itself onto a new HDD, think it's ok, to run badblocks on a different drive ?

G

georgelza · Aug 7, 2023

planned

G

sretalla · Aug 7, 2023

georgelza said:
got the client installed my side.

Maybe that's a little misunderstanding... tmux is entirely server-side (and is already installed on both CORE and SCALE).

You use SSH first (or can be the GUI shell... not sure why somebody would think that doesn't work from there)

Then in that shell, you start your command with tmux... as I indicated above... and you seem to have understood.

If you installed tmux on your Mac or PC, it's not going to be used here (but you can use it for other things if you ever connect to that system over ssh and want a long-running process).

Important Announcement for the TrueNAS Community.

disk pool throwing alert

Powered by Neutrality

Guru

MVP

Patron

Patron

MVP

Patron

Patron

Patron

Patron

Powered by Neutrality

Patron

Guru

Patron

Powered by Neutrality

Patron

MVP

Patron

Patron

Powered by Neutrality

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "disk pool throwing alert"

Similar threads