Witness-Script for Nutanix Metro Availability (python)

Last week I had some conversations around our Metro Availability and a so called “witness”, a third observation instance. I already build a short how to with the vRealize Orchestrator but sometimes there´s no VMware 🙂

I decided to use python to write a short “promotion” function that works as soon as the Metro site has the “Standby” state.

For automation purpose I wanted to have a SNMP receiver that tracks the site that goes down and sets the site to promote. Because pysnmp is much to heavy in this small case  i just used sockets to listen on a SNMP port.

Now there are 3 files:

  • config.py – holds the configuration parameter (user, pass, site DNS/IP, protection domain)
  • listen.py – has the sockets listener and sets the site parameter based on a string in the SNMP trap
  • witness. py – the promote function (REST call) based on the received site

You can find them here:  https://github.com/cjohannsen81/ntnx_witness

After starting the script with: sudo python witness.py (sudo in my case, cause of user restrictions) the socket listener will wait for a call on port 162. I used this one: snmptrap -v1 -c public 127.0.0.1 1.3.6.1.4.1.20408.4.1.1.2 127.0.0.1 1 1 123 1.3.6.1.6.3.1.1.5.2 s siteB with the site as string. You can adjust the SNMP stuff by changing the listen.py file to your needs.

As soon as there is a “siteA” or “siteB” string in the trap the listen.py sets the site that goes down and the witness.py will promote the standby site.

Screen Shot 2015-05-20 at 10.26.39

If necessary it´s also possible to just add a “disable” function for testing purpose or validation:

 

### Witness Script ###
# Author: Christian Johannsen
# Version: 0.2
#
# Note: Certificate verfication is set False
###

import listen
import config
import json 
import requests
import time

def promote(site):
    #supress the security warnings
    requests.packages.urllib3.disable_warnings()
        
    #first identify the site of the 'last' signal
    if (site=="siteA"):
        #set base_url to remote site
        base_url = "https://" + config.metro["siteB"] + ":9440/PrismGateway/services/rest/v1/"
        requests.get(base_url, verify=False)
    elif (site=="siteB"):
        #set base_url to remote site
        base_url = "https://" + config.metro["siteA"] + ":9440/PrismGateway/services/rest/v1/"
        requests.get(base_url, verify=False)
        
    s = requests.Session()
    s.auth = (config.cred["username"], config.cred["password"])
    s.headers.update({'Content-Type': 'application/json; charset=utf-8'})
    
    r = s.post(base_url + 'protection_domains/' + config.metro["pdName"] + "/promote?skipRemoteCheck=false", verify=False)  
    print r.content

def disable(site):
    #supress the security warnings
    requests.packages.urllib3.disable_warnings()
        
    #first identify the site of the 'last' signal
    if (site=="siteA"):
        #set base_url to remote site
        base_url = "https://" + config.metro["siteA"] + ":9440/PrismGateway/services/rest/v1/"
        requests.get(base_url, verify=False)
    elif (site=="siteB"):
        #set base_url to remote site
        base_url = "https://" + config.metro["siteB"] + ":9440/PrismGateway/services/rest/v1/"
        requests.get(base_url, verify=False)
        
    s = requests.Session()
    s.auth = (config.cred["username"], config.cred["password"])
    s.headers.update({'Content-Type': 'application/json; charset=utf-8'})
    
    r = s.post(base_url + 'protection_domains/' + config.metro["pdName"] + "/metro_avail_disable?skipRemoteCheck=true", verify=False)  
    print r.content

if __name__ == '__main__':
    site = listen.receiver()
    try:
        disable(site)
        time.sleep(20)
        promote(site)
    except:
        print "Exception"
        raise

This would disable the site that was reported and promotes the opposite site 😉

 

Leave a Reply