feature: grouping

This commit is contained in:
2026-02-02 12:55:32 +01:00
parent bdac561e86
commit 5c777483fc
3 changed files with 101 additions and 90 deletions

View File

@@ -1,47 +1,31 @@
# Redfish Exporter # Redfish-Exporter
A Python-based Prometheus exporter for collecting power data (Watts, Volts, Amperes) from bare metal servers using the Redfish API. This tool supports multiple vendors (e.g., HPE, Supermicro). A Python-based Prometheus exporter for collecting power data (Watts, Volts, Amperes) from bare metal servers using the Redfish API. This tool supports multiple vendors (e.g., HPE, Supermicro) and is designed to run cross-platform on Linux and Windows.
I've createtd this python script to collect Power data to analyse Watts, Volts and Amperes. If there is a better solution, feel free to replace me. I've createtd this python script to collect Power data to analyse Watts, Volts and Amperes. If there is a better solution or you want more feature, feel free to replace me or expand my prometheus exporter.
---
## Table of Contents
- [Redfish Exporter](#redfish-exporter)
- [Table of Contents](#table-of-contents)
- [Description](#description)
- [Features](#features)
- [Usage](#usage)
- [Installation](#installation)
- [Requirements](#requirements)
- [Configuration](#configuration)
- [Basic Configuration](#basic-configuration)
- [Basic Configuration](#basic-configuration-1)
- [Container](#container)
- [Legacy Installation](#legacy-installation)
- [Python Dependencies](#python-dependencies)
- [Create user](#create-user)
- [Systemd Service](#systemd-service)
- [Testet on Hardware](#testet-on-hardware)
- [License](#license)
---
## Description
This tool collects power metrics from servers using the Redfish API and exposes them in a format compatible with Prometheus. It supports both modern and legacy Redfish API versions and handles authentication for different vendors.
---
## Features ## Features
- Collects power metrics: Watts, Volts, and Amperes. - Collects power metrics: Watts, Volts, and Amperes.
- Supports multiple vendors (HPE, Supermicro, etc.). - Supports multiple vendors (HPE, Supermicro, etc.).
- Supports grouping.
- Cross-platform compatibility (Linux and Windows). - Cross-platform compatibility (Linux and Windows).
- Graceful error handling and retry logic. - Graceful error handling and retry logic.
- Configurable via YAML. - Configurable via YAML.
- Docker support. - Docker support.
## Metrics Overview
| Metric | Typ | Description |
| ------------------------------------ | --------- | -------------------------------------------------------------- |
| redfish_up | Gauge | Status from host (1 = reachable, 0 = not reachable). |
| redfish_psu_line_input_voltage_volts | Gauge | Voltages per powersupply (label: host, psu_serial). |
| redfish_psu_power_input_watts | Gauge | Watts per powersupply (label: host, psu_serial). |
| redfish_psu_input_amps | Gauge | Amperes per powersupply (label: host, psu_serial). |
| redfish_system_info | Info | Systeminformation (Vendor, Model, Serial, Redfish Version). |
| redfish_request_latency_seconds | Histogram | Latency (label: host). |
| redfish_errors_total | Counter | Number of errors per host and error type (label: host, error). |
## Usage ## Usage
```bash ```
usage: redfish_exporter.py [-h] [--config CONFIG] [--port PORT] [--interval INTERVAL] usage: python main.py [-h] [--config CONFIG] [--port PORT]
Redfish Prometheus Exporter Redfish Prometheus Exporter
@@ -49,28 +33,26 @@ options:
-h, --help show this help message and exit -h, --help show this help message and exit
--config CONFIG Path to config file --config CONFIG Path to config file
--port PORT Override port from config file --port PORT Override port from config file
--interval INTERVAL Override interval from config file
``` ```
# Installation # Install
## Requirements ## Requirements
Requirements: * just (optional)
* python 3.8+
* uv
* see `pyproject.tom`
* Python 3.8+ Install the dependencies using `uv`:
* see `pyproject.toml`
Install the dependencies using:
```bash ```bash
cd /srv/redfish-exporter
uv sync uv sync
source .venv/bin/activate source .venv/bin/activate
uv lock --upgrade --refresh uv lock --upgrade --refresh
``` ```
## Configuration ## Configuration
Create a `config.yaml` file with the following structure: Create `config.yaml` with following structure:
### Basic Configuration ### Basic Configuration
```yaml ```yaml
@@ -87,7 +69,7 @@ hosts:
- host4.example.net - host4.example.net
``` ```
### Basic Configuration ### Advanced Configuration
```yaml ```yaml
--- ---
interval: 5 interval: 5
@@ -95,15 +77,18 @@ port: 8000
username: user1 username: user1
password: secret1 password: secret1
chassis: ["1"] chassis: ["1"]
group: development # set default group for all hosts
hosts: hosts:
- fqdn: host1.example.net - fqdn: host1.example.net
username: user2 username: user2
password: secret2 password: secret2
chassis: ["0"] chassis: ["0"]
group: production # use group for specific host
- fqdn: host2.example.net - fqdn: host2.example.net
username: user3 username: user3
password: secret3 password: secret3
chassis: ["1"] chassis: ["1"]
group: stage
- fqdn: host3.example.net - fqdn: host3.example.net
username: user4 username: user4
password: secret4 password: secret4
@@ -113,30 +98,43 @@ hosts:
password: secret5 password: secret5
``` ```
The `port`, `interval` are optional and can be overwritten by argument. Save default values are hardcoded. The `port`, `interval` and `interval` are optional and can be be overridden by command-line arguments. Default values are hardcoded.
### Prometheus Configuration
```
global:
scrape_interval: 15s
evaluation_interval: 15s
# Container scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "redfish_exporter"
static_configs:
- targets: ["localhost:8000"] # Adjust to your config
metrics_path: /metrics
scrape_interval: 15s
```
# Docker / Container
To run the Redfish Exporter in a Docker container: To run the Redfish Exporter in a Docker container:
``` ```
docker buildx build -t your-tag . docker buildx build -t redfish_exporter .
docker run -it --rm --name redfish_exporter_app -p 8000:8000 your-tag:latest docker run -it --rm --name redfish_exporter_app -p 8000:8000 redfish_exporter:latest
``` ```
# Legacy Installation # Legacy Installation
## Python Dependencies
```bash ```bash
mkdir /srv/redfish-exporter mkdir /srv/redfish-exporter
# or # or
git clone https://github.com/dasbaum-ch/redfish-exporter.git /srv/redfish-exporter git clone https://github.com/dasbaum-ch/redfish-exporter.git /srv/redfish-exporter
```
## Python Dependencies
```bash
cd /srv/redfish-exporter cd /srv/redfish-exporter
uv sync uv sync --locked
source .venv/bin/activate
uv lock --upgrade --refresh
``` ```
## Create user ## Create user
@@ -145,28 +143,25 @@ sudo useradd -r -s /bin/false redfish
``` ```
## Systemd Service ## Systemd Service
1. Copy the systemd unit file: 1. Copy the systemd unit file:
```bash ```bash
sudo cp redfish-exporter.service /etc/systemd/system/redfish-exporter.service sudo cp redfish-exporter.service /etc/systemd/system/redfish-exporter.service
``` ```
1. Reload and start the service:
2. Reload and start the service:
```bash ```bash
sudo systemctl daemon-reload sudo systemctl daemon-reload
sudo systemctl enable --now redfish-exporter.service sudo systemctl enable --now redfish-exporter.service
``` ```
# Testet on Hardware
Here some Server's that I have successfully testet:
* Supermicro
* AS -5126GS-TNRT2
* Redfish 1.21.0
* AS -1124US-TNRP
* Redfish 1.8.0
* HPE
* ProLiant DL380 Gen10
* Redfish 1.6.0
# License # License
This project is licensed under the MIT License. See the LICENSE file for details. This project is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for details.
# Testet on Hardware
Here some Server's that I have successfully testet:
| Vendor | Model | Redfish Version |
| ---------- | -------------------- | --------------- |
| Supermicro | AS-5126GS-TNRT2 | 1.21.0 |
| | AS-1124US-TNRP | 1.8.0 |
| HPE | ProLiant DL380 Gen10 | 1.6.0 |

View File

@@ -4,6 +4,7 @@ port: 8000
username: gloabl-user username: gloabl-user
password: global-password password: global-password
chassis: ["1"] # Strings, not integers! chassis: ["1"] # Strings, not integers!
group: production
hosts: hosts:
- fqdn: host1.example.com - fqdn: host1.example.com
username: user1 username: user1

View File

@@ -56,6 +56,7 @@ class HostConfig:
username: str username: str
password: str password: str
chassis: list[str] | None = None chassis: list[str] | None = None
group: str = "none"
max_retries: int = 3 # 3 retires max_retries: int = 3 # 3 retires
backoff: int = 2 # wait 2 seconds backoff: int = 2 # wait 2 seconds
cool_down: int = 120 # seconds to wait after too many failures cool_down: int = 120 # seconds to wait after too many failures
@@ -92,24 +93,28 @@ REQUEST_TIME = Summary("request_processing_seconds", "Time spent processing requ
REQUEST_LATENCY = Histogram( REQUEST_LATENCY = Histogram(
"redfish_request_latency_seconds", "Time for Redfish request", ["host"] "redfish_request_latency_seconds", "Time for Redfish request", ["host"]
) )
UP_GAUGE = Gauge("redfish_up", "Host up/down", ["host"]) UP_GAUGE = Gauge("redfish_up", "Host up/down", ["host", "group"])
ERROR_COUNTER = Counter( ERROR_COUNTER = Counter(
"redfish_errors_total", "Total Redfish errors", ["host", "error"] "redfish_errors_total", "Total Redfish errors", ["host", "error"]
) )
VOLTAGE_GAUGE = Gauge( VOLTAGE_GAUGE = Gauge(
"redfish_psu_line_input_voltage_volts", "redfish_psu_line_input_voltage_volts",
"Line Input Voltage per PSU", "Line Input Voltage per PSU",
["host", "psu_serial"], ["host", "psu_serial", "group"],
) )
WATTS_GAUGE = Gauge( WATTS_GAUGE = Gauge(
"redfish_psu_power_input_watts", "Power Input Watts per PSU", ["host", "psu_serial"] "redfish_psu_power_input_watts",
"Power Input Watts per PSU",
["host", "psu_serial", "group"],
) )
AMPS_GAUGE = Gauge( AMPS_GAUGE = Gauge(
"redfish_psu_input_amps", "Current draw in Amps per PSU", ["host", "psu_serial"] "redfish_psu_input_amps",
"Current draw in Amps per PSU",
["host", "psu_serial", "group"],
) )
# set info metric # set info metric
SYSTEM_INFO = Info( SYSTEM_INFO = Info(
"redfish_system_info", "System information (model, serial, etc.)", ["host"] "redfish_system_info", "System information (model, serial, etc.)", ["host", "group"]
) )
@@ -170,7 +175,7 @@ async def fetch_with_retry(session, host: HostConfig, url: str) -> dict | None:
logging.warning( logging.warning(
"Skipping %s (in cool-down until %.1f)", host.fqdn, host.next_retry_time "Skipping %s (in cool-down until %.1f)", host.fqdn, host.next_retry_time
) )
UP_GAUGE.labels(host=host.fqdn).set(0) UP_GAUGE.labels(host=host.fqdn, group=host.group).set(0)
return None return None
# Probe vendor if not already known # Probe vendor if not already known
@@ -391,7 +396,7 @@ async def get_power_data(session, host: HostConfig):
logging.warning( logging.warning(
"Skipping %s (in cool-down until %.1f)", host.fqdn, host.next_retry_time "Skipping %s (in cool-down until %.1f)", host.fqdn, host.next_retry_time
) )
UP_GAUGE.labels(host=host.fqdn).set(0) UP_GAUGE.labels(host=host.fqdn, group=host.group).set(0)
return return
# Start time measurement # Start time measurement
@@ -402,17 +407,17 @@ async def get_power_data(session, host: HostConfig):
if not resources or not resources.chassis: if not resources or not resources.chassis:
logging.error("Could not discover any resources for %s", host.fqdn) logging.error("Could not discover any resources for %s", host.fqdn)
host.mark_failure() host.mark_failure()
UP_GAUGE.labels(host=host.fqdn).set(0) UP_GAUGE.labels(host=host.fqdn, group=host.group).set(0)
return return
host.mark_success() host.mark_success()
UP_GAUGE.labels(host=host.fqdn).set(1) UP_GAUGE.labels(host=host.fqdn, group=host.group).set(1)
chassis_url = f"https://{host.fqdn}{resources.chassis}" chassis_url = f"https://{host.fqdn}{resources.chassis}"
chassis_data = await fetch_with_retry(session, host, chassis_url) chassis_data = await fetch_with_retry(session, host, chassis_url)
if not chassis_data: if not chassis_data:
host.mark_failure() host.mark_failure()
UP_GAUGE.labels(host=host.fqdn).set(0) UP_GAUGE.labels(host=host.fqdn, group=host.group).set(0)
return return
for chassis_member in chassis_data.get("Members", []): for chassis_member in chassis_data.get("Members", []):
@@ -498,13 +503,17 @@ async def get_power_data(session, host: HostConfig):
def update_prometheus_metrics(host: HostConfig, metrics: PowerMetrics): def update_prometheus_metrics(host: HostConfig, metrics: PowerMetrics):
"""Update Prometheus metrics with PowerMetrics data.""" """Update Prometheus metrics with PowerMetrics data."""
if metrics.voltage is not None and metrics.serial: if metrics.voltage is not None and metrics.serial:
VOLTAGE_GAUGE.labels(host=host.fqdn, psu_serial=metrics.serial).set( VOLTAGE_GAUGE.labels(
metrics.voltage host=host.fqdn, psu_serial=metrics.serial, group=host.group
) ).set(metrics.voltage)
if metrics.watts is not None and metrics.serial: if metrics.watts is not None and metrics.serial:
WATTS_GAUGE.labels(host=host.fqdn, psu_serial=metrics.serial).set(metrics.watts) WATTS_GAUGE.labels(
host=host.fqdn, psu_serial=metrics.serial, group=host.group
).set(metrics.watts)
if metrics.amps is not None and metrics.serial: if metrics.amps is not None and metrics.serial:
AMPS_GAUGE.labels(host=host.fqdn, psu_serial=metrics.serial).set(metrics.amps) AMPS_GAUGE.labels(
host=host.fqdn, psu_serial=metrics.serial, group=host.group
).set(metrics.amps)
async def get_system_info(session, host: HostConfig): async def get_system_info(session, host: HostConfig):
@@ -547,7 +556,7 @@ async def get_system_info(session, host: HostConfig):
serial_number = system_data.get("SerialNumber") serial_number = system_data.get("SerialNumber")
# Hier könnte ihre Werbung stehen # Hier könnte ihre Werbung stehen
SYSTEM_INFO.labels(host=host.fqdn).info( SYSTEM_INFO.labels(host=host.fqdn, group=host.group).info(
{ {
"manufacturer": manufacturer, "manufacturer": manufacturer,
"model": model, "model": model,
@@ -587,7 +596,8 @@ async def run_exporter(config, stop_event):
port = config.get("port", 8000) port = config.get("port", 8000)
default_username = config.get("username") default_username = config.get("username")
default_password = config.get("password") default_password = config.get("password")
default_chassis = config.get("chassis") default_chassis = config.get("chassis", "1")
default_group = config.get("group", "none")
hosts = config["hosts"] hosts = config["hosts"]
interval = config.get("interval", 10) interval = config.get("interval", 10)
@@ -604,10 +614,15 @@ async def run_exporter(config, stop_event):
username=host_entry.get("username", default_username), username=host_entry.get("username", default_username),
password=host_entry.get("password", default_password), password=host_entry.get("password", default_password),
chassis=host_entry.get("chassis", default_chassis), chassis=host_entry.get("chassis", default_chassis),
group=host_entry.get("group", default_group),
) )
else: else:
hc = HostConfig( hc = HostConfig(
fqdn=host_entry, username=default_username, password=default_password fqdn=host_entry,
username=default_username,
password=default_password,
chassis=default_chassis,
group=default_group,
) )
host_objs.append(hc) host_objs.append(hc)