# #monitoringlove # with Sensu ## DevOps meetup Dublin ## July 2015 ---  ## [http://sensuapp.org](http://sensuapp.org) Notes: Monitoring framework/router --- # Why Sensu? - Simple: Ruby, JSON - Scalable: Distributed architecture - Flexible: Plugins in any language - Compatible: Can use Nagios checks - Versatile: Collects both checks and metrics - Social: Great user community --- # Jochen Lillich ## @geewiz Notes: Founder & CTO freistil.IT --- # www.freistilbox.com ## Hosting PaaS for Drupal and WordPress Notes: Started using Nagios but hit load and queueing problems DevOps Days 2012, Ulf Månsson: #monitoringlove --- # Sensu Core Notes: Open Source, MIT license Created in part-time by Sean Porter at Sonian, now maintained full-time at Heavy Water Operations --- # Sensu Enterprise Notes: Improved performance Metrics conversion Third-party integrations Commercial support --- # Installation - Omnibus packaging - Configuration in JSON files - [Sensu cookbook for Chef](https://github.com/sensu/sensu-chef) - [Puppet module](https://github.com/sensu/sensu-puppet) Notes: Automation: Never forget to add new machines ---  ---  - Connects all Sensu components - Asynchronous communication --- # Sensu Server - Schedules check execution - Processes check results - Triggers event handlers --- # Sensu Client - Registers automatically with the Server - Sends keepalive information - Receives check execution requests - Schedules local checks - Executes checks - Publishes check results - Publishes external events --- # API - get event data - get agent data - trigger check execution - resolve events - silence checks Notes: REST-like interface --- # Dashboard - [Uchiwa](http://sensuapp.org/docs/latest/dashboards_uchiwa) - [sensu-admin](https://github.com/sensu/sensu-admin) ---  Notes: executed by the Sensu client Subscription Nagios protocol triggers event only for non-zero check results type "event" triggers always client-specific values --- # Scheduling - Standard checks (server) - Standalone checks (client) - Manual checks (API) --- ```json { "checks": { "disk_free": { "type": "status", "subscribers": [ "all" ], "handlers": [ "default" ], "command": "/usr/lib/nagios/plugins/check_disk -w :::disk_warn::: -c :::disk_crit::: -A -x /dev/shm -X nfs -i /boot", "interval": 60 } } } ``` Notes: `interval`, `occurrences`, `refresh`, `low_flap_threshold`, `high_flap_threshold` --- # Checks in Chef ```ruby sensu_check 'mysql_server' do command "/usr/lib/nagios/plugins/check_mysql " + "-u 'monitoring' " + "-p '#{node['mysql']['server_mon_password']}'" handlers ['default'] standalone true interval 30 end ``` --- # Metrics check ```json { "checks": { "load_metrics": { "type": "metric", "command": "load-metrics.rb", "subscribers": [ "production" ], "interval": 10 } } } ``` --- # Metrics output ``` $ ruby load-metrics.rb srv3.local.load_avg.one 0.89 1365270842 srv3.local.load_avg.five 1.01 1365270842 srv3.local.load_avg.fifteen 1.06 1365270842 $ echo $? 0 ``` --- # External events ```bash echo '{ "name": "my_check", "output": "some output", "status": 0 }' > /dev/tcp/localhost/3030 ``` Useful: https://github.com/solarkennedy/sensu-shell-helper --- # Handler types - Pipe Notes: Pipe handlers are for executing a command (or script), passing it the event data via STDIN. - TCP Notes: TCP handlers are for writing event data to a TCP socket. - UDP Notes: UDP handlers are for writing event data to a UDP socket. - Transport Notes: Transport handlers are for publishing event data to a Sensu transport, such as RabbitMQ (default). - Sets Notes: Handler sets are for grouping handlers; a way to send the same event data to one or more handlers, or simply create an alias. --- # Common event handlers - Email - PagerDuty - Graphite - IRC - Slack ---  --- # Example handler code ```ruby #!/usr/bin/env ruby require 'rubygems' require 'json' # Read event data event = JSON.parse(STDIN.read, :symbolize_names => true) # Write the event data to a file file_name = "/tmp/sensu_#{event[:client][:name]}_" + "#{event[:check][:name]}" File.open(file_name, 'w') do |file| file.write(JSON.pretty_generate(event)) end ``` --- # Example handler configuration ```json { "handlers": { "file": { "type": "pipe", "command": "/etc/sensu/handlers/file.rb" } } } ``` --- # Sensu CLI [https://github.com/agent462/sensu-cli](https://github.com/agent462/sensu-cli) - `sensu-cli resolve srv3 apache_http` - `sensu-cli client delete srv3` - `sensu-cli silence srv3 --reason "Shut up already" --expire 3600` --- # #chatops [https://github.com/sensu/sensu-hubot](https://github.com/sensu/sensu-hubot) - `sensu events summarize` - `sensu events filter severity critical` - `sensu events filter subscription webservers` --- # Monitoring your monitoring - Check RabbitMQ ready queue! --- # Scaling Sensu --- # Scaling a single site - Sensu Server Notes: Run multiple sensu-server instances with the same RabbitMQ and Redis; automatic internal master election. - Sensu API Notes: Stateless HTTP frontend; traditional load-balancing strategies - RabbitMQ Notes: See [RabbitMQ clustering documentation](https://www.rabbitmq.com/clustering.html) - Redis Notes: Single master; multiple Redis instances for fault tolerance; see [Redis Sentinel](http://redis.io/topics/sentinel) --- # Multi-DC operation ---  Notes: All Sensu clients execute checks locally. Their only interaction with Sensu servers is to push events onto RabbitMQ. Therefore, remote clients can connect directly to a remote RabbitMQ broker over the WAN. Notes: + Very simple architecture, no additional infrastructure needed at remote sites Notes: + Centralized alert handling Notes: - Keepalive failures are now indistinguishable from WAN instability Notes: - Lots of remote clients means lots of TCP connections over the WAN Notes: - All clients appear to be in the same datacenter in Uchiwa ---  Notes: RabbitMQ [Federation plugin](https://www.rabbitmq.com/federation.html) or [Shovel plugin](https://www.rabbitmq.com/shovel.html) Notes: This is picking Availability and Partition Tolerance over Consistency with RabbitMQ. Notes: + Decreased infrastructure necessary at remote Datacenters Notes: + All Sensu server alerts originate from a single source Notes: - WAN instability can result in floods of client keepalive alerts; requires check dependencies Notes: - Increased RabbitMQ configuration complexity Notes: - All clients “appear” to be in the same datacenter in Uchiwa ---  Notes: + WAN instability does *not* lead to flapping sensu checks Notes: + Sensu operation continues un-interrupted during a WAN outage Notes: + The overall architecture is easier to understand and troubleshoot Notes: - WAN outages mean a whole Datacenter can go dark and not set off alerts (cross-datacenter checks are therefore essential) Notes: - WAN instability can lead to a lack of visibility as Uchiwa may not be able to connect to the remote Sensu APIs Notes: - Requires all the Sensu infrastructure in every datacenter --- # HA - [High availability monitoring with Sensu](http://failshell.io/sensu/high-availability-sensu/) --- # References --- # Community plugins https://github.com/sensu/sensu-community-plugins - Over 600 plugins - 80 contributors --- # Support - #sensu on FreeNode IRC - sensu-users mailing list - Commercial support from HeavyWater --- # Thank you! ## @geewiz ## jochen@freistil.it