Hello guys, during my system house keeping i found this video , which i had recorded some time ago, thought to share with you.
This video is about how we do monitoring of our Production Ceph clusters. In our company we prefer open source software for service monitoring. We have been using the following open source projects all-together for monitoring.
- Opsview : Its based on Nagios. Until March 4 , 2015 OpsView was available under GPL license , however its opensource version is now discontinued.
- Collectd : collectd gathers statistics about the system it is running on and stores this information
- Graphite : Is a tool for monitoring and graphing the performance of computer system in real time.
- Grafana : Is Graph and dashboard builder for visualising time series data
Opsview Monitoring for Ceph cluster
How does it work
- Opsview agent is a service which runs on each Ceph node.
- Opsview Core ( master server ) instructs Opsview agents to execute service checks
- Opsview agent returns result of the service check to Opsview core
- Opsview core then display the results as ‘OK’ , ‘Warning’ or ‘Errors’
Grafana Dashboard for Ceph cluster
How does it work
- Collectd daemon runs on every Ceph node and collects system performance data as well as Ceph cluster information.
- Collectd daemon then sends this data to Graphite server.
- Graphite server stores this data and can also display in the form of graph’s
- For a nice dashboard experience you should use Grafana , that gets the data from graphite and displays them in beautiful, feature rich dashboards
If you are a IT Shop , you definitely need effective monitoring system for your services. Ceph is a clustered system with no SPOF , but it also requires monitoring. And my personal favourite , if you have monitoring / performance data , you can travel back in time and see your cluster state and several interesting things.