Use Amazon SNS for Nagios Alerts
Amazon recently added SMS publishing capability to their Simple Notification Service(SNS) platform.
Using SNS to send nagios alerts:
At Bitly, we used individual carrier’s email SMS gateways (eg. 123456789@text.att.net) to send pages to the on-call person as well as other people on the ops team. This solution was not cutting it as messages were heavily throttled on the carrier’s side (the gateway being a free service provided by the carrier meant that we couldn’t really complain or get the problem rectified). This basically meant that pages arrived hours later in some cases and defeated the purpose of an alert during an emergency.
Once SMS capability was announced, it was pretty much a no-brainer to switch to SNS.
I wrote a quick and simple python script with the help of a popular AWS python library called boto. You can find the source over at http://bit.ly/pyAmazonSNS.
This script can be used to send any kind of SNS message, not just SMS.
Here is a snippet of how this would tie into nagios:
define command{
command_name notify-host-by-txt
command_line printf "%b" "$HOSTALIAS$" | send_sns.py $CONTACTPAGER$
Feel free to fork it, improve on it and add features beyond the scope of our usage. Drop us a line to let us know :)
Introduction to simplehttp
Part of our engineering philosophy is to keep things fast and simple. Aim to serve one purpose and serve it well. Speak HTTP and encode in JSON. Prototype in Python and speed it up in C.
There are a few components that follow these tenets and sit at the core of our infrastructure. We’ve open-sourced them under the simplehttp moniker. They serve as the the architectural foundation for higher-order functionality.
simplehttp
At the lowest level is the simplehttp library, an abstraction of libevent’s evhttp functions, aimed at trivializing the task of writing an evented HTTP server in C. It’s dead simple yet provides high-level features such as:
- Tornado inspired options parsing
- Tornado inspired logging
- Automatic per-endpoint stat tracking (request counts, 95% times, averages)
- Clean API to perform async HTTP requests
Built on top of simplehttp, perhaps the most important daemons are simplequeue and pubsub.
simplequeue
A rock-solid in-memory message queue for arbitrary message bodies (we use JSON), providing basic /get and /put endpoints. We use this in several key areas to serve as a work queue for asynchronous processing. During a maintenance window or situations where backend services are degraded it also acts as a buffer, queueing up work that needs to be done when backend services are restored.
We use long-lived Python “queuereaders” to poll the simplequeue and perform work. This work might be writing to a database, logging, aggregating, or anything else that you might not want to perform in a blocking fashion during a request cycle. These queuereaders have built-in backoff timers which slow down the processing rate when errors are detected to allow a struggling backend to recover gracefully and to reduce the load on the machine running the queuereader.
Generally, we silo a simplequeue and its associated queuereaders on each host of the service. Meaning a simplequeue on hostA will only contain messages from requests received by that host and its queuereaders will only process messages from its local simplequeue. We do this to address single point of failure issues.
pubsub
We have many different types of data at bitly, each classified into a stream. There are streams of encodes (shortens), decodes (“clicks”), user events, etc. In order to provide a central, consistent, means for developers to access data in realtime we expose these streams via pubsub.
Publishing a message is a simple HTTP request to the /pub endpoint, in our case this usually happens in an queuereader who’s sole purpose is to read off a specific simplequeue and write to a specific pubsub.
A client consuming the stream is a a long-lived HTTP request to the /sub endpoint. Messages are transmitted as newline deliminated JSON.
To pair with pubsub the repository contains three additional utilities built on pubsubclient. ps_to_file and ps_to_http are fairly self-explanatory. One archives a pubsub stream to a file (automatically rolling the output files for you based on a configurable strftime format string), and the other writes a stream of data to destination HTTP endpoints. The latter can be used to send messages to a simplequeue, another pubsub stream, or any other HTTP endpoint. Additionally, pubsub_filtered repeats a pubsub stream and provides the option to remove or obfuscate fields, creating a filtered view on a subset of the data. At bitly we use these tools to archive our data streams and to pass data published by one application into another application (or another datacenter).
EOL
If any of these things sound like fun projects to hack on, bitly is hiring.
Welcome to the bitly Engineering Blog
At bitly, we have been happy to contribute to a number of open source projects, and we have even started a few of our own. We look forward to talking about those, and other engineering details here. You can stay up-to-date by following @bitly