Monitoring Azure resources with Zabbix

[Updated] Monitoring Azure resources with Zabbix

June 25, 2020
During last year, I've been actively refactoring this solution and finally I've updated this article. All code moved to GitHub where you can clone it to your PC and update as needed.

Also, I've removed all comments as they're not relevant anymore.

 

Hey!

Today is the great day, as I'm finally going to share my approach to monitoring of Azure Cloud resources with Zabbix. There is no built-in solution for monitoring Azure with 3rd-party software (at least, I know nothing about that), so we'll need to build our own. Before you start configuring\scripting, please make sure you're familiar with most common Zabbix features, because the task is not trivial.

Azure monitoring

Why do we need all this? Well, I've encountered a number of issues, problems and limitations while trying to use native Azure tools - Log Analytics (to paint graphs) and Monitor (to configure alerting). Metrics could appear there with big delays, problems with writing queries, triggering - is also a problem.

So, we've decided to adopt Zabbix for monitoring of Azure...

 

Tools / prerequisites

What tools and APIs will we use in the article:

  • Azure Monitoring API - through this API we will fetch metrics data from cloud.
  • Zabbix :-)
  • Powershell 7.0 both for Linux and your local Windows machine.
  • Zabbix API
  • Azure Service Principal with permissions to read monitor metrics from the cloud.
  • VS Code is recommended

ALSO:
Zabbix Azure set of Powershell scripts will do all the work for you. You can get it from https://github.com/vicioussn/zabbix-azure. There are hard-coded values in the code and they should be replaced with your real data (like password, keys, etc). I'll talk more about this in the very end of the article.

ALSO:
You must be familiar with Powershell a bit to be able to modify scripts (replace strings, comment un-needed code, etc.).

Process overview

The following workflow\statements was designed to achieve the goal:

  1. Zabbix hosts represent Azure Resource Groups.
  2. Discover individual metrics of each required resources via Zabbix LLD and put them as Zabbix items to Zabbix host. Those items will be of 'trapper' type to allow external tools\scripts to push metrics in.
  3. Execute script on schedule basis (for example - every 5 minutes) per each Zabbix host (i.e. Resource Group). Script will use Azure Monitor API to pull metrics from Azure cloud and push them to Zabbix items.

Put the solution to your Zabbix server

In order to make this solution work, you must get the whole solution from GitHub repo (development sub-folder): https://github.com/vicioussn/zabbix-azure/tree/master/development. Put all files to the /usr/lib/zabbix/externalscripts on your Zabbix server (and ensure you and your Zabbix Server user account have permissions to create files there).

Along with the code, please create the /var/log/azure-script/ folder and assign permissions to yourself and your Zabbix server user. This folder is for script logging.

As described below in the Known Issues section, credentials to Zabbix Server and Azure are hard-coded in scripts. Thus you must change them to your data. Here's the list of files you should look into:

  • helpers/Get-ZaAzureMonitorToken.ps1 (parameters section)
  • helpers/Set-ZaAzureMonitorZabbixHostItemValue.ps1 (strings 8 and 10)

And the last thing - there are scripts in root folder with 'manual' suffix. You can use those to perform tests and debugging from your local Windows machine.

Zabbix host

To be more specific, let's set up monitoring for Azure SQL Database - to be able to continuously monitor it's performance and react if DTU consumption is too high.

First of all, let's create Zabbix hosts for our resources. My approach assume that we create single host for each Azure Resource Group you have. This host will contain items (metrics) for every Azure resource in the Resource Group. Here is sample screenshot:

Monitoring Azure resources with Zabbix

Note IP address field - it should point to Zabbix Server IP. 127.0.0.1 will be good.

Then go to Macros tab and set variables related to the Resource Group. Those will be used as parameters to LLD scripts and scheduled scripts.

  • {$RESOURCEGROUP} - is the name of the Resource Group.
  • {$SUBSCRIPTIONID} - is the Azure subscription ID.

Monitoring Azure resources with Zabbix

Now it's time to create Zabbix template. We will link it to our newly created host later.

Zabbix template

Cloud is very unstable thing, resources may be created dynamically and you can be sure - sometimes you'll not be informed about that. So we will use Zabbix discovery rules to have the most relevant info about Azure SQL databases.

  • Name is whatever you want.
  • Type is External check - we'll execute Powershell script to discover SQL databases (but yes - that's correct - Zabbix can run sh only scripts).
  • Key - is the name of Bash script to be executed with parameters. The last one for our particular case is sqlDatabase, but can be any and it's defined in the Get-ZaAzureResourceApiParameters.ps1 file of the solution (there is a switch statement there to handle possible types of resources).
  • Update interval - paste reasonable value there, or your server will be over-loaded.

Monitoring Azure resources with Zabbix

Here is the bash-script to run Powershell from Zabbix for this discovery rule:

 Here is sample output of script.

Monitoring Azure resources with Zabbix

 

NOTE
As you can see, this set of parameters passed to script will result that all available Azure Monitor metrics will be returned. You can limit the output by specifying additional parameter in the end - which is a metricName (not DisplayName). You can get the list of all supported metrics for all resource types at: Supported metrics with Azure Monitor.
Output example:

Monitoring Azure resources with Zabbix

 

Now you're ready to create Item prototypes in Zabbix discovery rule. Before you do that, make sure you've tested discovery script and it returns expected results.

  • Name - something you will understand and able to use in Triggers or Graphs.
  • Type - Zabbix trapper.
  • Key - azure.resource key with parameters (in order as follows), most of them will be discovered:
    • Resource name
      The screenshot below has mistake in this parameter for the case with Azure SQL - in the particular case it should be: {#PARENTRESOURCENAME}/{#RESOURCENAME}.
      This is the thing you should remember: if the resource type you're configuring is 'nested' (i.e. sqlServer/sqlDatabase) - then you will use additional property returned by discovery script {#PARENTRESOURCENAME}.
    • Resource type
    • Metric name
    • Time grain
    • Primary aggregation type

Monitoring Azure resources with Zabbix

  

Zabbix items

Ok, now you're ready to link template to host and see what happened. You should see a number of number discovered.

Monitoring Azure resources with Zabbix

 

There is another one thing to mention. Sometimes Azure Monitor will not work as expected. For example - if the API is broken and returning errors. Or maybe my solution will stop working (honestly - yes - there are some issues).

For the sake of being informed about those problems, you should create item with azureMonitor.lastUpdated key and create Trigger for it to be informed that the item was not updated for some time (use nodata() Zabbix function for that).

Pulling metrics from Azure Monitor API 

Here goes the most interesting part of my work. We've got items of Trapper type. That means that something must push metrics to Zabbix server to this item.

It will be done by script, executed every 5 minutes by cron. Why 5 minutes? As I wrote above, there is a timeGrain variable in Azure Monitor API, which defines, how often metric is updated in cloud. You might have tons of items in Zabbix, and pulling metrics from Azure every 1 minute will freeze your Zabbix server CPU. So I've decided some adequate interval to get updates from Azure.

Ok, let's move to the main script that pulls metrics from Monitor API. Don't forget to replace sensitive parameters to yours.

Main steps which script does:

  1. Create lock-file to indicate that the process is running. This is required to avoid running two or more instances of the script at the same time.
  2. Get Zabbix host name from parameters passed to script.
  3. Login to Zabbix API.
  4. Get Zabbix items from Zabbix host and parse their keys.
  5. Get Zabbix history for each item and identify - when was the last metric pushed to the server. Based on that info, script will generate the correct request to Azure Monitor API. If there are no values for the item in Zabbix, script will ask Monitor API metrics info for the last three days (Set-ZaAzureMonitorZabbixHostItemValue.ps1 file, string around 58).
  6. Login to Azure.
  7. For each Zabbix item get metrics from Monitor API for the calculated time period.
  8. Form a text string with metric info for each sample of metric from (5) and put string to result array.
  9. Write result array to a file.
  10. Push the file to Zabbix server with zabbix_sender tool.
  11. Push the 'success' string to the azureMonitor.lastUpdated item.
  12. Log out from Zabbix API.
  13. Remove lock-file.

 

 Here is how to set up cron to run script every 5 minutes. Don't ask me how to modify it to different interval - I copied it from stackoverflow :-).

  

Again - it's recommended that you perform test runs of the pulling script to identify issues.

Known problems

There are number of issues I know exist, but they're not critical and maybe I'll solve them in future. Or - you're welcome to contribute.

  • Sometimes the pulling script will fail and wont remove lock-file, blocking further script runs.
  • Sometimes, when two Powershell processes try to write temporary azure.json file, they will corrupt it. Again, this will lead to unexpected terminations.
  • Passwords\secrets are hard-coded in scripts. With Zabbix 5 released, they can be moved as secured Zabbix Host Macro. This will require some work.

Tags: script, powershell (en), azure (en), zabbix (en)

Print