ZFS

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
Yeah I read about that stuff (putting the ZIL in a SLOG) on that ZFS on Linux documentation site before I set off on this journey but that's a bit hardcore for me I think! Is it something you have to do when you create the pool? I really messed up my first pool tbh, it's too late to do anything about it now but
  • I used partitions instead of whole disks which is warned against in the "best practises" docs. I think it'll be ok, they're 4k drives so I wanted to make sure the partitions were aligned (and the partitions use the whole disks, I just arranged the start block to be aligned for 4k sectors) before I presented them to ZFS (this is before I knew about setting ashift=12 on creation).
  • I didn't turn on compression. It won't be very beneficial to me with (99% videos) but it's very light on the CPU and I'm storing a lot of backed up Blu-ray discs as folders (not ISO's) so there's all sorts of playlists and text-y files that can be compressed. I turned compression on for the new pool and it saved me 6gb on a 983gb set of data woohoo.
The only solution to these problems would be to a) win the lottery and b) buy at least 8 more 4tb disks so I could get all the data across into a well-made pool and then have *proper* redundancy (ie 2 separate mirrored 8 disk raidz2's).
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
I finally started a scrub on the primary pool...
Code:
  scan: scrub in progress since Fri Jan 24 09:53:53 2014
  954G scanned out of 28.4T at 497M/s, 16h7m to go
  0 repaired, 3.28% done
 

TdC

Trem's hunky sex love muffin
Joined
Dec 20, 2003
Messages
30,801
lol at that scrub :D nice speed though :)

thing is, it's *all* about getting your IOps maximized. ZFS is super robust: you can actually modify the pools in flight if you're brave (I did!) adding or removing cache and slog drives. It really could be worth your while to add a cache for zfs meta-data. A slog on the other hand really only gets used for sync IO (which you can enforce btw) but if you're Joe User you may or may not care about this.

What I did with my test with the USB drives was like so:
drives: 2x USB3 Lacie Rikiki 1TB
Zfs: zpool create mypool /dev/sdc /dev/sdd
Tuning: zfs set mypool atime=off ; zfs mypool set dedup=off

all new stuff in this pool will get the settings from the master pool so you can forget about that (unless you want to turn stuff on for specific partitions).

so, make things:
zfs create mypool -V 50G testvolume

nb I needed zvols hence the -V or I would have done things differently. Anyway, messing about with data on that I got about 450odd IOps checking with zpool iostat -v 2

I didn't have any drives left over so I made some ram drives available: mknod /dev/ram -b 1 1 #look before you leap
then initialize: dd if=/dev/zero of=/dev/ram bs=1k count=16k
This should cajole iirc udev in to making some /dev/ramX devices
I had /dev/ram0 to /dev/ram15 to play with, potentially 64M each which iirc is the max size for these kinds of devices in linux without trickery, so make two drives for slog:
for i in 0 1; do dd if=/dev/zero of=/dev/ram$i bs=1k count=64k; done

add to zpool: zpool add mypool log mirror /dev/ram0 /dev/ram1
which will add the devices as a mirror which is in the best practices list
you can watch the slog being used (if you don't have sync traffic) by forcing syncs for everything: zfs set mypool/somevolume sync=always
and then doing whatever on that volume to generate IO

what I also did was make ramdrives for cache
for i in 5 6 7 8; do dd if=/dev/zero of=/dev/ram$i bs=1k count=64k; done
zpool add mypool cache /dev/ram5 /dev/ram6 /dev/ram7 /dev/ram8

zfs always adds cache drives as a stripe so nothing further to do. look at what you made with zpool iostat -v

if you generate IO on the volume, zfs will use what you gave it to cache and log things. the good stuff behind this is that zfs can size up data to efficiently write to the actual disks thus being able to perform optimally (presumably). Log volumes don't have to be super bigg, but cache can be. I noticed I could hit something like a gig in cache with all the ramdrives in there easy peasy. On the log I could fill about 128 megs, but there would always be free space so I wasn't saturating it. I also noticed, and this is the main point of my writing this, that I gained 150~200 IOps which is 25~40%!!!! Fairly significant tbh! Anyway, if you want to test this, don't forget to remove the ramdrives afterwards, or if you want to keep them make sure you write a start script to initialize the ramdrives that runs very early in the init sequences or ZFS will be very, very angry with you :)

NB: I typed all the commands above from memory so there may be some errors. Test at your own risk :)
 

Deebs

Chief Arsewipe
Staff member
Moderator
FH Subscriber
Joined
Dec 11, 1997
Messages
9,076,920
Oh my god, I am so going to buy that case (the wheels will be removed and burned) and merge my 3 NAS into one. Just need to find somewhere that has the black one in stock. About 300 quid right?
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
Oh my god, I am so going to buy that case (the wheels will be removed and burned) and merge my 3 NAS into one. Just need to find somewhere that has the black one in stock. About 300 quid right?
Yeah, I got it through amazon.fr and it worked out to about £310* I think. You might well change your mind about the wheels when you've filled it with drives, the case itself is only 13kg but when you fill it up it weighs a bloody ton. :)

*I just checked the confirmation email and it actually came to £263.90 after exchanging from EUR. Bargain. :)
 

Deebs

Chief Arsewipe
Staff member
Moderator
FH Subscriber
Joined
Dec 11, 1997
Messages
9,076,920
Yeah, I got it through amazon.fr and it worked out to about £310* I think. You might well change your mind about the wheels when you've filled it with drives, the case itself is only 13kg but when you fill it up it weighs a bloody ton. :)

*I just checked the confirmation email and it actually came to £263.90 after exchanging from EUR. Bargain. :)
Wheels are gh3y. They will be removed. Some things in life are not allowed wheels, a NAS is one of them. I've got 16 drives to put in this beast once it arrives. My problem is how I migrate .....
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
How are they set up in the NAS? Get an HBA like the LSI 9201-16i (that price is ridiculous btw, I got mine for under £300) and If it's all ZFS (if the versions match) you might be able to just export/import the pools.

Oh and I was right the first time about the case, the confirmation email didn't include the TVA so it was £~315.
 

Deebs

Chief Arsewipe
Staff member
Moderator
FH Subscriber
Joined
Dec 11, 1997
Messages
9,076,920
How are they set up in the NAS? Get an HBA like the LSI 9201-16i (that price is ridiculous btw, I got mine for under £300) and If it's all ZFS (if the versions match) you might be able to just export/import the pools.

Oh and I was right the first time about the case, the confirmation email didn't include the TVA so it was £~315.
It is not a problem about how they are setup, it is more of a logistical issue, I would need to buy more drives to start the migration.
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
It is not a problem about how they are setup, it is more of a logistical issue, I would need to buy more drives to start the migration.
True. Bear in mind that Lian-Li do not include one single fan with the case (not even back panel exhaust fans) so you will ideally need 3 fans on each side of the hard drive enclosure to push air across the drives.
 

MYstIC G

Official Licensed Lump of Coal™ Distributor
Staff member
Moderator
FH Subscriber
Joined
Dec 22, 2003
Messages
12,362
It is not a problem about how they are setup, it is more of a logistical issue, I would need to buy more drives to start the migration.
FreeNAS would just import the volumes wouldn't it?
 

MYstIC G

Official Licensed Lump of Coal™ Distributor
Staff member
Moderator
FH Subscriber
Joined
Dec 22, 2003
Messages
12,362
I guess, but my issue is more a physical one, 16 drives to none...
I'm lost now, wouldn't you just pull the drives from your old NAS enclosures and stick them in this?
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
lol at that scrub :D
Code:
  scan: scrub in progress since Fri Jan 24 09:53:53 2014
  23.7T scanned out of 28.4T at 453M/s, 3h3m to go
  0 repaired, 83.28% done
:\
 

TdC

Trem's hunky sex love muffin
Joined
Dec 20, 2003
Messages
30,801
I revisited my setup re the iSCSI and quad GB cards:

I split the single bond in the debian server in to two bonds of two GB interfaces, and did the same with the windows team interface. Then I set the networks across the bonds/teams to be distinct, e.g. bond0/team0 sits in network A, and bond1/team1 sits in network B. After this I set up the iSCSI target (e.g. the "server") to allow access to two of the LUNs in a 4 LUN stripe from network A, and the other 2 from network B.

Windows now shows me having two 2GB links up, and ATTO disk bench shows that my read speads have remained the same at around 400Mb/s, whereas my write speeds have doubled to something in the order of nearly 200Mb/s (speeds in optimal conditions, obv), which was basically the goal of the exercise. I did this because my chosen iSCSI target software allows only one connection to a LUN at any given time, and the windows team has exactly one primary interface (which will be used to write with, e.g. write speeds will only ever be 1GB/s max). Splitting the team allows the read speed to remain the same because windows will continue to round robin read through the link aggregation, thus using all 4 1GB links, but will now have 2 primary interfaces to write with, effectively doubling the bandwidth for writing.
 

Ch3tan

I aer teh win!!
Joined
Dec 22, 2003
Messages
27,318
Worse than him being a massive geek is the fact I understood :(
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
OK, I've done it properly now! The original pool (with partitions instead of whole disks as the ZFS members and no compression) has gone and I've now got 2 pools: one 10-disk RAIDz2 and one 6 disk RAIDz2. That's given me about 42tb of usable space across both pools.

Also, I lost my nerve with non-ECC RAM after reading a couple of horror stories so I've gone for a more server-y setup with a Xeon E3 1245 v3, 32gb of Kingston ECC RAM and a Asus P9D-WS mobo. I was going to wait a couple of weeks for the new E3 Xeons to land but... well... I didn't.

I haven't tried to scrub the big pool yet. :\ It's 28tb usable (was 20tb before) but it should go at a slightly faster rate because of the 2 extra disks. Right? :\
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
An rsync from the big pool to the little pool:
Code:
sent 7.31T bytes  received 126.08K bytes  247.44M bytes/sec
Better than I was expecting, it has to be said.
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
How might one do that then please? It was a whole dataset from the bigger pool that I was sending.
 

TdC

Trem's hunky sex love muffin
Joined
Dec 20, 2003
Messages
30,801
hmm, iirc you make a snapshot of your source pool/volume, and make a target pool/volume, and do something like zfs send sourcepool@sourcesnap targetpool/targetvolume. then drop your snapshot. you can even do it to a different machine over the network.
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
Hrmph, it's actually scrubbing slower on the 8+2 pool than it did on the original 6+2 one. :\
Code:
175G scanned out of 30.1T at 368M/s, 23h39m to go
 

TdC

Trem's hunky sex love muffin
Joined
Dec 20, 2003
Messages
30,801
which SAS/SATA cards did you have again? I bought a small, fastish SSD as a dedicated disk to put my cache on. should speed things up while I save up for a LSI or something.
 

caLLous

I am a FH squatter
FH Subscriber
Joined
Dec 23, 2003
Messages
18,426
Both of the big pools are running through an LSI 9201-16i. Before I um, lost control, I had a couple of other cards (a really crappy Startech 4 port and a better Highpoint Rocket 4 port) but this LSI one is perfect.

D'ya think a SSD'd cache would help with scrubbing? If it's something I'm going to need to do weekly (with consumer level disks) and it takes just shy of 24 hours to do, I would like to speed it up if possible. I mean, I don't think weekly scrubs are necessary (it's basically just archiving so the drives aren't going to ever be thrashed after all the stuff is on them in the first place) and I would put money down that I would forget anyway but still I want it to be as swift as possible.
Code:
  scan: scrub in progress since Mon May 26 00:22:01 2014
  22.1T scanned out of 30.1T at 356M/s, 6h31m to go
  0 repaired, 73.51% done
 

TdC

Trem's hunky sex love muffin
Joined
Dec 20, 2003
Messages
30,801
Cheers! I do actually believe that having a decent cache will help. My scrubs are actually fairly slow, but that is traceable to the relatively small amount of disks (3) and being on the internal controller. My cache and log are blindingly fast, as they are in ram...but that also makes them comparatively small.

I made the choice of this specific SSD as it is reasonably sized (60GB), fairly cheap (45 euros) and fairly quick (450Mb/s read and write (on paper / optimal use cases)), but ofc it's really all about the IOps which range from about 35k to 85k. This setup, together with the soon to be bought LSI and an extra disk, will sort me out...for ever really (or at least for a year or two).

Also...well, at least you're honest about your...hobby ;-)
 

Users who are viewing this thread

Top Bottom