Last week I had some conversations around our Metro Availability and a so called “witness”, a third observation instance. I already build a short how to with the vRealize Orchestrator but sometimes there´s no VMware 🙂
I decided to use python to write a short “promotion” function that works as soon as the Metro site has the “Standby” state.
For automation purpose I wanted to have a SNMP receiver that tracks the site that goes down and sets the site to promote. Because pysnmp is much to heavy in this small case i just used sockets to listen on a SNMP port.
Now there are 3 files:
- config.py – holds the configuration parameter (user, pass, site DNS/IP, protection domain)
- listen.py – has the sockets listener and sets the site parameter based on a string in the SNMP trap
- witness. py – the promote function (REST call) based on the received site
You can find them here: https://github.com/cjohannsen81/ntnx_witness
After starting the script with: sudo python witness.py (sudo in my case, cause of user restrictions) the socket listener will wait for a call on port 162. I used this one: snmptrap -v1 -c public 127.0.0.1 1.3.6.1.4.1.20408.4.1.1.2 127.0.0.1 1 1 123 1.3.6.1.6.3.1.1.5.2 s siteB with the site as string. You can adjust the SNMP stuff by changing the listen.py file to your needs.
As soon as there is a “siteA” or “siteB” string in the trap the listen.py sets the site that goes down and the witness.py will promote the standby site.
If necessary it´s also possible to just add a “disable” function for testing purpose or validation:
### Witness Script ### # Author: Christian Johannsen # Version: 0.2 # # Note: Certificate verfication is set False ### import listen import config import json import requests import time def promote(site): #supress the security warnings requests.packages.urllib3.disable_warnings() #first identify the site of the 'last' signal if (site=="siteA"): #set base_url to remote site base_url = "https://" + config.metro["siteB"] + ":9440/PrismGateway/services/rest/v1/" requests.get(base_url, verify=False) elif (site=="siteB"): #set base_url to remote site base_url = "https://" + config.metro["siteA"] + ":9440/PrismGateway/services/rest/v1/" requests.get(base_url, verify=False) s = requests.Session() s.auth = (config.cred["username"], config.cred["password"]) s.headers.update({'Content-Type': 'application/json; charset=utf-8'}) r = s.post(base_url + 'protection_domains/' + config.metro["pdName"] + "/promote?skipRemoteCheck=false", verify=False) print r.content def disable(site): #supress the security warnings requests.packages.urllib3.disable_warnings() #first identify the site of the 'last' signal if (site=="siteA"): #set base_url to remote site base_url = "https://" + config.metro["siteA"] + ":9440/PrismGateway/services/rest/v1/" requests.get(base_url, verify=False) elif (site=="siteB"): #set base_url to remote site base_url = "https://" + config.metro["siteB"] + ":9440/PrismGateway/services/rest/v1/" requests.get(base_url, verify=False) s = requests.Session() s.auth = (config.cred["username"], config.cred["password"]) s.headers.update({'Content-Type': 'application/json; charset=utf-8'}) r = s.post(base_url + 'protection_domains/' + config.metro["pdName"] + "/metro_avail_disable?skipRemoteCheck=true", verify=False) print r.content if __name__ == '__main__': site = listen.receiver() try: disable(site) time.sleep(20) promote(site) except: print "Exception" raise
This would disable the site that was reported and promotes the opposite site 😉