disk pool throwing alert

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
OK, so data in the pool is good, disks not so much.

At least no CRC errors returned, so just deal with the disks.
 
Joined
Jun 15, 2022
Messages
674
FYI last time I ran badblocks over an old WD 4TB it took 3 days or so, but gave the desired result SMART 197 Currently Pending Sectors = 0
It'll generally run much faster if you tweak the settings as appropriate for your setup. I went from 3 days to something like 6 hours...I'd look it up but the data is on the jumpdrive I accidentally blew away this past weekend. The point is the default settings aren't usually optimal by a long-shot.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
NugentS

This NAS is my backup... I have some of the documents/photos replicated to Google, but there is still a sh4t load here that I don't want to loose.
Also running a test on media that's going to take a week/2 is simply not viable.
G
Why not?
Answer - you probably don't have enough SATA ports (or drive bays) for another drive do you?
What about running the test on another machine?

I did say remove and replace first - testing can come afterwards
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
family is on iPad's. I'm on a MBP that I use for work, so it's moved around.
Got replacement drives, first step.
G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
OK, so data in the pool is good, disks not so much.
going to replace this drive with the new
At least no CRC errors returned, so just deal with the disks.
I"m going to do a replacement on this drive that previously had the errors. - thats part of Bunker. and follow with 2nd and a 3rd replacement as the new drives arrive. aka rebuild the entire disk pool, I'm nervous about it. thinking I've just been lucky so far.

G
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
so one drive has been removed... the one with the reported errors on it...
2nd drive that was coming up in alerts is in process of being replaced,
3rd/last drive of the Diskpool will be replaced Monday. together with a new PSU that hs enough direct power points for all drive, thus removing the power cable splitters.

I should have 1 or 2 open SATA ports and think 1, maybe 2 SATA Ports, which I can then use to connect the known problem drive and do a thorough check/test on it, but ye, hopefully not a multi week, so might need some assistance to tweak the test params.
G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
for those following.
re bunker diskpool,
replaced 2 out of the 3 hdd's, alert emails have stopped it seems, will replace the 3rd, which will increase the size then.
will then replace the tank pool onto bunker, releasing tanks disks and the original 4TB from bunker. then see how i get mark bad blocks, which disks are good/useable and then create a new RaidZ2 using them, should have 5 wide diskpool if my assumptions are right.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
can someone point me to a good document on how to relocate dataset from diskpool to diskpool please.

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
It'll generally run much faster if you tweak the settings as appropriate for your setup. I went from 3 days to something like 6 hours...I'd look it up but the data is on the jumpdrive I accidentally blew away this past weekend. The point is the default settings aren't usually optimal by a long-shot.
while we on this, can you advise what to set... never used it before.
once I have the data moved off, going to run it disc per disc, I have 1 open port that I can use for this verification step.
need to check all 6 x 4TB's
G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
zfs snapshot -r <old pool>/<dataset>@<snapshot name>
zfs send -R <old pool>/<dataset>@<snapshot name> | zfs recv -Fv <New Pool>/<dataset>

Once you're happy the data is all there and safe;
zfs destroy -f <old pool>/<dataset>

You could also set up a replication task and use it once, then do the delete.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
New PSU and last drive is inbound, will be here today, allowing me to change bunker from a 7.7TB pool to approx 14.6TB,
will have the rebuild complete, then move data from tank to bunker, releasing all the 4TB HDD's.
G
 
Joined
Jun 15, 2022
Messages
674
while we on this, can you advise what to set... never used it before.
once I have the data moved off, going to run it disc per disc, I have 1 open port that I can use for this verification step.
need to check all 6 x 4TB's
G
I should write a script for that...

badblocks

-b
block_size set this to your drive's largest logical block size, generally labeled on the drive. Minimizes drive controller overhead.

-c number of blocks to write at a time, 64 default, generally this is best set between 2048 & 8192 depending on the size of the HBA cache. Minimizes communications overhead and can have a staggering effect on performance.

-e max bad block count, you may want to set this to something reasonable so badblocks aborts on a bad drive, generally 128 is an upper limit though over 15 bad blocks often means a failing drive (dependent on multiple factors).

-p num_passes, I run 3, which is four test patterns run 3x = 12 full-disk writes + 12 full-disk reads = 24 full-disk passes, and that's a lot. Normally 1 run of 4 patterns is enough to get an idea of current drive health.

-svw to see what's going on
(this does wipe the drive, a read won't detect future errors)

-o output file, for later analysis if you want.
 
Last edited:

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
so, how much cpu does bad blocks eat up.

I have one drive connected/unused already on which i can start bad blocks.

I have a screen/kb avail so was thinking running it in there... allowing me to disconnect

command planned

badblocks -b 4096 <searching for block size for ST4000VN008-2DR166> -c 256 -e 64 -p 1 -svw -o <where ?>

please explain a bit more: -svw to see what's going on (this does wipe the drive, a read won't detect future errors)

G
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
I have one drive connected/unused already on which i can start bad blocks.

I have a screen/kb avail so was thinking running it in there... allowing me to disconnect
use tmux from ssh...

tmux new-session -d -s badblocks '/usr/sbin/badblocks -b.....'

Then at any time:

tmux attach -t badblocks

When in that session, CTRL + B, then D to disconnect. Also CTRL + C when in that session to terminate the running code.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
haven't used tux before, looked at it enough times to know you were going to say this.
so what do I install on the NAS.

and then will get a tux client for the MAC so long.

G
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
tmux is already installed on CORE, but you cannot use it in the web shell. You have to connect via ssh.
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
I"m on SCALE, will try quickly.

got the client installed my side.

while I got a diskpool resilvering itself onto a new HDD, think it's ok, to run badblocks on a different drive ?

G
 

georgelza

Patron
Joined
Feb 24, 2021
Messages
417
planned

1691424992791.png


G
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,700
got the client installed my side.
Maybe that's a little misunderstanding... tmux is entirely server-side (and is already installed on both CORE and SCALE).

You use SSH first (or can be the GUI shell... not sure why somebody would think that doesn't work from there)

Then in that shell, you start your command with tmux... as I indicated above... and you seem to have understood.

If you installed tmux on your Mac or PC, it's not going to be used here (but you can use it for other things if you ever connect to that system over ssh and want a long-running process).
 
Top