Running a Home Lab on a Single vSAN Node
March 21, 2015 9 Comments
This is how I managed to run my lab on a single vSAN node and manage it completely Windows free, which is always a goal for a Mac user like me; with vSphere 6 this is a lot easier than it used to be in the past thanks to improvement in the Web Client (and the fact that the fat client doesn’t connect to vCenter anymore) and also thanks to the new VCSA that comes with deployment tools for Mac.
About the storage side of things, I’ve always been running my lab with some kind of virtual storage appliance in the past (Nexenta, Atlantis, Datacore) but those require a lot of memory and processing power and this reduces the number of VMs I can run in my lab simultaneously.
It’s true that I can get storage acceleration like this (which is so important in a home lab) but I sacrifice consolidation ratio and add complexity to take into account when I do upgrades and maintenance, so I decided to change my approach and include my physical lab in the process of learning vSAN.
If all goes as I would like I will get storage performance without sacrificing too many resources for it and this would be awesome.
Here is my current hardware setup in terms of disks:
1 Samsung SSD 840 PRO Series
1 Samsung SSD 830
3 Seagate Barracuda ST31000524AS 1TB 7200 RPM 32MB Cache SATA 6.0Gb/s 3.5″
I also have another spare ST31000524AS that I might add later but that would require me to add a disk controller.
Speaking of which, my current controller (C602 AHCI – Patsburg) is not in the vSAN HCL and the queue depth is listed as a pretty depressive value of 31 (per port) but I am still just running a lab and I don’t really need to achieve production grade performance numbers; nevertheless I have been looking around on eBay and it seems like with about €100 I can get a supported disk controller but I decided to wait a few weeks to make sure VMware updates the HCL just because I don’t want to buy something that won’t be on vSphere6/vSAN6 HCL plus I might still get the performance I need with my current setup, or at least this is what I hope.
UPDATE: The controller I was keeping an eye on doesn’t seem to be listed in the HCL for vSAN 6 even now that the HCL is reported to be updated so be careful with your lab purchases!
For the time being I will test this environment on my current disk controller and learn how to troubleshoot performance bottlenecks in vSAN which is going to be a great exercise anyway.
The first thing to do in my case was to decommission the current disks, so once I delete the VSA that was using them as RDM I needed to make sure that the disks had no partitions left on them since this will create problems claiming them during the vSAN setup, so I accessed my ESXi via SSH and started playing around with the command line:
esxcli storage core device list # list block storage devices
Which gave me a list of devices that I could use with vSAN (showing one disk only):
t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Display Name: Local ATA Disk (t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____) Has Settable Display Name: true Size: 244198 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Vendor: ATA Model: Samsung SSD 840 Revision: DXM0 SCSI Level: 5 Is Pseudo: false Status: on Is RDM Capable: false Is Local: true Is Removable: false Is SSD: true Is VVOL PE: false Is Offline: false Is Perennially Reserved: false Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: yes Attached Filters: VAAI Status: unknown Other UIDs: vml.0100000000533132524e45414342303639373142202020202053616d73756e Is Shared Clusterwide: false Is Local SAS Device: false Is SAS: false Is USB: false Is Boot USB Device: false Is Boot Device: false Device Max Queue Depth: 31 No of outstanding IOs with competing worlds: 32 Drive Type: unknown RAID Level: unknown Number of Physical Drives: unknown Protection Enabled: false PI Activated: false PI Type: 0 PI Protection Mask: NO PROTECTION Supported Guard Types: NO GUARD SUPPORT DIX Enabled: false DIX Guard Type: NO GUARD SUPPORT Emulated DIX/DIF Enabled: false
This is useful to identify the SSD devices, the device names and their physical path. Here’s a recap of the useful information in my environment:
/vmfs/devices/disks/t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ /vmfs/devices/disks/t10.ATA_____SAMSUNG_SSD_830_Series__________________S0VYNYABC03672______ /vmfs/devices/disks/t10.ATA_____ST31000524AS________________________________________5VPDP87L /vmfs/devices/disks/t10.ATA_____ST31000524AS________________________________________5VPDP8N3 /vmfs/devices/disks/t10.ATA_____ST31000524AS________________________________________9VPC5AQ9
The Samsung 840 Pro will give me much better performance in a vSAN diskgroup so I will put aside the 830 for now.
Now for each and every disk I check the presence of partitions and removed all of them if any; I’m going to show you the commands I run against one disk as an example:
~ # partedUtil getptbl /vmfs/devices/disks/t10.ATA_____ST31000524AS________________________________________5VPDP87L gpt 121601 255 63 1953525168 1 34 262177 E3C9E3160B5C4DB8817DF92DF00215AE microsoftRsvd 0 2 264192 1953519615 5085BD5BA7744D76A916638748803704 unknown 0 ~ # partedUtil delete /vmfs/devices/disks/t10.ATA_____ST31000524AS________________________________________5VPDP87L 2 ~ # partedUtil delete /vmfs/devices/disks/t10.ATA_____ST31000524AS________________________________________5VPDP87L 1 ~ # partedUtil getptbl /vmfs/devices/disks/t10.ATA_____ST31000524AS________________________________________5VPDP87L gpt 121601 255 63 1953525168
partedUtil is used to manage partitions, the “getptbl” shows the partitions (2 in this case) and the delete command removes them; note how in the end of these commands I needed to specify the partition number on top of which I wanted to execute the operation.
At that point with all the disks ready I needed to change the default vSAN policy because otherwise I wouldn’t be able to satisfy the 3-nodes requirement, so I needed to enable the “ForceProvisioning” setting.
Considering that at some point vSAN will need to destage writes from SSD to HDD I also decided to enable StripeWidth and set it to “3” so I could take advantage of all of my 3 HDD when IOs involve the magnetic disks.
Please note that this is probably a good idea in a lab while in a production environment you will need to find good reasons for this since VMware encourages customers to leave the default value at “1”; problems to consider comes into play when you are doing the sizing of your environment (careful about components number even if vSAN 6 raised the per host limit from 3000 to 9000), in general your should read the “VMware Virtual SAN 6.0
Design and Sizing Guide” (http://goo.gl/BePpyI) before making any architectural decision.
To change vSAN default policy and create a cluster I made very minor changes to William Lam steps described here for vSAN 1.0:
esxcli vsan policy getdefault # display the current settings esxcli vsan policy setdefault -c cluster -p "((\"hostFailuresToTolerate\" i0) (\"forceProvisioning\" i1) (\"stripeWidth\" i3))" esxcli vsan policy setdefault -c vdisk -p "((\"hostFailuresToTolerate\" i0) (\"forceProvisioning\" i1) (\"stripeWidth\" i3))" esxcli vsan policy setdefault -c vmnamespace -p "((\"hostFailuresToTolerate\" i0) (\"forceProvisioning\" i1) (\"stripeWidth\" i3))" esxcli vsan policy setdefault -c vmswap -p "((\"hostFailuresToTolerate\" i0) (\"forceProvisioning\" i1) (\"stripeWidth\" i3))" esxcli vsan policy setdefault -c vmem -p "((\"hostFailuresToTolerate\" i0) (\"forceProvisioning\" i1) (\"stripeWidth\" i3))" esxcli vsan policy getdefault # check that the changes made are active
This is when I created the vSAN cluster comprised of one node:
esxcli vsan cluster new esxcli vsan cluster get
Cluster Information Enabled: true Current Local Time: 2015-03-21T10:23:14Z Local Node UUID: 51a90242-c628-b3bc-4f8d-6805ca180c29 Local Node State: MASTER Local Node Health State: HEALTHY Sub-Cluster Master UUID: 51a90242-c628-b3bc-4f8d-6805ca180c29 Sub-Cluster Backup UUID: Sub-Cluster UUID: 52b2e982-fd0f-bc1a-46a0-2159f081c93d Sub-Cluster Membership Entry Revision: 0 Sub-Cluster Member UUIDs: 51a90242-c628-b3bc-4f8d-6805ca180c29 Sub-Cluster Membership UUID: 34430d55-4b18-888a-00a7-74d02b27faf8
I was good to add the disks to a diskgroup now, remember that in every diskgroup there is 1 SSD and one or more HDD:
[root@esxi:~] esxcli vsan storage add -s t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ -d t10.ATA_____ST31000524AS________________________________________5VPDP87L [root@esxi:~] esxcli vsan storage add -s t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ -d t10.ATA_____ST31000524AS________________________________________5VPDP8N3 [root@esxi:~] esxcli vsan storage add -s t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ -d t10.ATA_____ST31000524AS________________________________________9VPC5AQ9
I had no errors, so I checked the vSAN storage to see what was composed of:
esxcli vsan storage list
t10.ATA_____ST31000524AS________________________________________5VPDP87L Device: t10.ATA_____ST31000524AS________________________________________5VPDP87L Display Name: t10.ATA_____ST31000524AS________________________________________5VPDP87L Is SSD: false VSAN UUID: 527ae2ad-7572-3bf7-4d57-546789dd7703 VSAN Disk Group UUID: 52e56e97-d27b-6d9b-d1fe-c73da8082ccc VSAN Disk Group Name: t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Used by this host: true In CMMDS: true Checksum: 2442595905156199819 Checksum OK: true Emulated DIX/DIF Enabled: false t10.ATA_____ST31000524AS________________________________________9VPC5AQ9 Device: t10.ATA_____ST31000524AS________________________________________9VPC5AQ9 Display Name: t10.ATA_____ST31000524AS________________________________________9VPC5AQ9 Is SSD: false VSAN UUID: 52e06341-1491-13ea-4816-c6e6338316dc VSAN Disk Group UUID: 52e56e97-d27b-6d9b-d1fe-c73da8082ccc VSAN Disk Group Name: t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Used by this host: true In CMMDS: true Checksum: 1139180948185469177 Checksum OK: true Emulated DIX/DIF Enabled: false t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Device: t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Display Name: t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Is SSD: true VSAN UUID: 52e56e97-d27b-6d9b-d1fe-c73da8082ccc VSAN Disk Group UUID: 52e56e97-d27b-6d9b-d1fe-c73da8082ccc VSAN Disk Group Name: t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Used by this host: true In CMMDS: true Checksum: 10619796523455951412 Checksum OK: true Emulated DIX/DIF Enabled: false t10.ATA_____ST31000524AS________________________________________5VPDP8N3 Device: t10.ATA_____ST31000524AS________________________________________5VPDP8N3 Display Name: t10.ATA_____ST31000524AS________________________________________5VPDP8N3 Is SSD: false VSAN UUID: 52f501d7-ac52-ffa4-a45b-5c33d62039a1 VSAN Disk Group UUID: 52e56e97-d27b-6d9b-d1fe-c73da8082ccc VSAN Disk Group Name: t10.ATA_____Samsung_SSD_840_PRO_Series______________S12RNEACB06971B_____ Used by this host: true In CMMDS: true Checksum: 7613613771702318357 Checksum OK: true Emulated DIX/DIF Enabled: false
At this point I could see my “vsanDatastore” in the vSphere Client. (I had no vCenter yet)
The next step will be to deploy vCenter on this datastore; I will be using VCSA and I will show you how to do it with a Mac.
Pingback: MyVirtuaLife.Net
Thank you for sharing, I’m trying to setup a single node VSAN cluster for my home lab. Should I go with 1x240GB or 2x240GB or 1x480GB disk for the SSD layer. I will also have 3x1TB HDD.
vSAN requires 1 SSD per disk pool.
If you have 3 magnetic disk you probably want to create 1 disk pool hence 1 SSD, so go for the biggest and fastest SSD you can find FOR WRITES because all SSD are good for reads.
hank you for replying, so does it make any difference 1×240 vs 1×480? I’m guessing i’m going to be able to put more on the SSD tier with the bigger SSD but I want to make sure that it makes sense
thank you for replying, so does it make any difference 1×240 vs 1×480? I’m guessing i’m going to be able to put more on the SSD tier with the bigger SSD but I want to make sure that it makes sense
The SSD tier does not partecipate to the capacity of the vSAN datastore but it is used 70% as read cache and 30% as write buffer so you don’t decide what to place or where.
VMware best practice is to size the SSD as 10% of the magnetic tier so in your case it needs to be at least 300GB but the biggest the better because you can keep more in the read cache (and not reading from magnetic disk) and have more space for writes when it comes to it.
I’m running single node VSAN too at home. I’ve noticed no issues spinning up VMs, but when I goto migrate a VM from NAS storage to the VSAN datastore, I get an error talking about fault domains. Same when deploying from a template. But, can provision a new VM no problem. Have policy all setup correctly. Can you test and see if you can deploy from a template, or migrate from another datastore to VSAN? Thanks!
Tim,
I don’t experience the problems you have.
It would be helpful to know what error is coming up exactly and what do you mean by having policies all setup correclty.
Turned out to be a corrupted VSAN storage policy on vCenter. Had to recreate. Was interesting as it ignored some aspects of policy but not others. It though I had fault domains, which obviously I did not.