<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>The bitly engineering blog</description><title>word</title><generator>Tumblr (3.0; @wordbitly)</generator><link>http://word.bitly.com/</link><item><title>Metrics - Building Clickatron</title><description>&lt;p&gt;One of the core product features of bitly has always been metrics. Unlike tools like google analytics,
bitly metrics have always been a way to gather website metrics when you don&amp;#8217;t control the website.
Want metrics for a book you wrote on Amazon? Use a bitly link. Want metrics for a blog post? You can
use a bitly link for that, too.&lt;/p&gt;

&lt;p&gt;As bitly usage has grown, one of the systems that has evolved several times along the way is our metrics
platform. Originally implemented as log files with a hierarchical timestamp key, later metrics data was loaded into a
flat mysql table. A subsequent revision moved that data into a cluster of
&lt;a href="http://fallabs.com/tokyocabinet/"&gt;tokyocabinet&lt;/a&gt; servers. That was eventually supplemented with some additional metrics
datasets in &lt;a href="http://www.mongodb.org/"&gt;mongodb&lt;/a&gt;. By early 2011 we were using 3 different metrics systems, and had
outgrown 3 others. We started with a set of goals to build a scalable time series database application, which we now
call Clickatron.&lt;/p&gt;

&lt;h2&gt;Goals&lt;/h2&gt;

&lt;p&gt;The set of goals we wanted to achieve with a new metrics system were:&lt;/p&gt;

&lt;p&gt;1) Move from daily to hourly granularity (not everyone should have 
to pretend they live in EST). A desire was to be able to satisfy read requests on-the-fly in any
unit of time (hour, day, week, month) for any timezone based on the single hourly dataset (removing
duplication present in other systems)&lt;/p&gt;

&lt;p&gt;2) Compact on-disk format. One of the previous metrics systems had a very inefficient data structure which was causing
both bloated data set sizes and performance problems resulting from insufficient disk IO. In order to satisfy read
requests at different units of time, it was also storing several copies of the data.&lt;/p&gt;

&lt;p&gt;3) Ability to bulk-load new datasets. There were two problems with previous metrics systems that
inhibited adding new metrics datasets. Previous systems lacked proper namespacing which caused
essentially duplicate systems for each dataset, and there was not enough excess capacity
to insert new metrics going backwards in time.&lt;/p&gt;

&lt;p&gt;4) Scalability. Our existing metrics systems were at their limits in terms of handling real-time increments and we
didn&amp;#8217;t have a simple path forward to add capacity to those systems.&lt;/p&gt;

&lt;h2&gt;Architecture&lt;/h2&gt;

&lt;p&gt;As mentioned in our post about &lt;a href="http://word.bitly.com/post/20350137230/sortdb"&gt;sortdb&lt;/a&gt;, many datasets have a small subset that is heavily updated, and a
larger related dataset that does not change. Metrics data is a textbook example where updates are limited to a
small time range, and an overwhelming majority of the data is static.&lt;/p&gt;

&lt;p&gt;We chose to take advantage of this and implement an architecture that split data into two separate storage systems. A
Realtime System is responsible for per-hour data covering the past 48 hours and also handles increment operations. A
separate Archival System stores all of the older data and is updated daily with a new set of static database files from
a Rebuild System.&lt;/p&gt;

&lt;p&gt;Each of these two systems are themselves clusters of key/value databases. The keys are a combination of
namespace, record key, and time components (explained in more detail below). Data is spread across the shards
of each cluster by a simple &lt;code&gt;crc(shard_key) % number_shards&lt;/code&gt;. The shard key is comprised of only parts of the
actual key in order to improve data locality on a per-request basis. For example the shard key excludes time components
so that all data for a request is stored on the same shard regardless of the time range requested.&lt;/p&gt;

&lt;h3&gt;Realtime System&lt;/h3&gt;

&lt;p&gt;The Realtime System is comprised of a 3-position ring of storage engines with positions for &lt;code&gt;Today&lt;/code&gt;, &lt;code&gt;Yesterday&lt;/code&gt;, and
&lt;code&gt;Tomorrow&lt;/code&gt;. The storage engines in the &lt;code&gt;Today&lt;/code&gt; position handle increments. The storage engine in the &lt;code&gt;Yesterday&lt;/code&gt;
position has a 24-hour window to be exported before being cleared for re-use in the &lt;code&gt;Tomorrow&lt;/code&gt; position.&lt;/p&gt;

&lt;p&gt;Each spot on the ring corresponds to a cluster of &lt;code&gt;simpletokyo&lt;/code&gt; / &lt;code&gt;ttserver&lt;/code&gt; pairs which data is
sharded across. Records stored in this system are stored in &lt;code&gt;&amp;lt;key&amp;gt;,&amp;lt;value&amp;gt;&lt;/code&gt; format where each record represents
an integer value for a single hour.&lt;/p&gt;

&lt;h3&gt;Rebuild System&lt;/h3&gt;

&lt;p&gt;The Rebuild System serves 3 purposes. Its primary role is to do the export from the Realtime System every 24 hours and
combine that new dataset with all previous datasets to generate new data files for the Archive System. As part of that
export process, the Rebuild System generates a second copy of the merged data files to serve as backup. Because it
generates new data files from scratch every day, it also provides a spot to introduce new historical data
independent of the number of records added. We have been able to use this to introduce new datasets as large as 13
billion keys. Good luck waiting for that many inserts into [insert database name].&lt;/p&gt;

&lt;p&gt;The rebuild process is a set of steps that involve sorting, merging, reformatting keys, and combining records. That&amp;#8217;s
followed by a final sharding and another sort and merge operation. These steps happen largely in parallel and are each
split into smaller segments of work that get distributed across several machines. Many of these steps also intelligently
avoid redoing unnecessary work by keeping intermediate files split by time range.&lt;/p&gt;

&lt;h3&gt;Archive System&lt;/h3&gt;

&lt;p&gt;The Archive System consists of static csv files accessed through &lt;a href="http://word.bitly.com/post/20350137230/sortdb"&gt;sortdb&lt;/a&gt;. These files are spread across
hosts as needed depending on desired read performance. Multiple sortdb instances for the same data files can be used on
separate physical hosts for read availability and fault tolerance.&lt;/p&gt;

&lt;h3&gt;API&lt;/h3&gt;

&lt;p&gt;The API layer directs increment requests to the right spot on the Realtime System ring. For increments, the API layer
also writes out a local oplog for data durability in case there is a hardware failure in the Realtime System. For data
fetches, the API layer handles the potential need to pull data from both the Realtime System and the Archive System,
merging the two data sets into a single response. The components of the API involved in incrementing are written in C
using &lt;a href="https://github.com/bitly/simplehttp"&gt;simplehttp&lt;/a&gt;, and the components for querying are written in a combination of python using
&lt;a href="http://www.tornadoweb.org"&gt;tornadoweb&lt;/a&gt; and C using simplehttp.&lt;/p&gt;

&lt;h3&gt;Architecture Overview&lt;/h3&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_m2yrxsJ5Bd1qz94k4.png" alt="clickatron architecture overview"/&gt;&lt;/p&gt;

&lt;h2&gt;Storage Formats&lt;/h2&gt;

&lt;h3&gt;Database Key Optimizations&lt;/h3&gt;

&lt;p&gt;Two optimizations directly aimed at lowering disk storage needs are a compact time format and a global lookup
table for records with long keys. The choice to use sortdb also directly reduced our storage requirements as the data
index is stored with the data (more specifically the index &lt;em&gt;is&lt;/em&gt; the data).&lt;/p&gt;

&lt;p&gt;For time values we use a compact 4 character &lt;code&gt;YMDH&lt;/code&gt; representation. Using this format &lt;code&gt;c3p1&lt;/code&gt; corresponds to 1am on March 25th, 2012.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;YMDH == _b32(year % 100) + _b32(month) + _b32(day) + _b32(hour)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Any time a key contains a reserved character (&lt;code&gt;&amp;lt;space&amp;gt;&lt;/code&gt; &lt;code&gt;:&lt;/code&gt; &lt;code&gt;,&lt;/code&gt; &lt;code&gt;.&lt;/code&gt; &lt;code&gt;|&lt;/code&gt;) or is 12 or more characters, we create a
12-character hash of it, and store a lookup record. This means that if we want to count the number of referrers from
&lt;code&gt;&lt;a href="http://www.facebook.com/"&gt;http://www.facebook.com/&lt;/a&gt;&lt;/code&gt; instead of storing that 24-character string every time we wanted to count it, we count a
smaller 12-character hash such that &lt;code&gt;_hash('http://www.facebook.com/') == 'h0aMB8AuNw4='&lt;/code&gt;, and do a reverse lookup to
expand the hash back to the full value at query time. It may not seem like much, a savings of even a few bytes for
repeated values makes a big difference on datasets with billions of records.&lt;/p&gt;

&lt;h3&gt;Record Types&lt;/h3&gt;

&lt;p&gt;Our time series database is made up of 3 types of records. &lt;strong&gt;Lookup Records&lt;/strong&gt; (which are the expansions for the
12 character hashes above). &lt;strong&gt;Total Records&lt;/strong&gt; which are a single key with values per hour over time. And &lt;strong&gt;Subtotal
Records&lt;/strong&gt; which represent the line item values whose sum rolls up to the &lt;em&gt;Total Record&lt;/em&gt; for each hour.&lt;/p&gt;

&lt;p&gt;We use a different data format between the Realtime System and the Archive System for storing &lt;em&gt;Total Records&lt;/em&gt; and
&lt;em&gt;Subtotal Records&lt;/em&gt;. The data format used for the Realtime System is optimized for writes while the format used for the
Archive System is optimized for compactness and reads.&lt;/p&gt;

&lt;p&gt;In the Realtime System records have one data point, but in the Archive System records are multi-column and have many
values. Many records in the Realtime System will be collapsed to a single multi-column record in the Archive System.
Records in the Realtime System are always for a single hour, but &lt;em&gt;Total Records&lt;/em&gt; in the Archive System are for all time
ranges, and &lt;em&gt;Subtotal Records&lt;/em&gt; are similarly for all subtotal keys (for an hour). This facilitates retrieval, but it
also provides for a more compact storage (the record key is stored only once for that multi-column record compared with
the Realtime System). Metrics data is very sparse (many keys only have values for a few hours) so the time values in a
&lt;em&gt;Total Record&lt;/em&gt; also serve as an index of what data to expected in the equivalent subtotal data set.&lt;/p&gt;

&lt;h3&gt;Record Formats&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Total Records - Realtime System&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A &lt;em&gt;Total Record&lt;/em&gt; as stored in the Realtime System consists of a &lt;code&gt;namespace&lt;/code&gt;, a &lt;code&gt;total key&lt;/code&gt;, an hourly &lt;code&gt;time
value&lt;/code&gt;, and the &lt;code&gt;counter value&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_m2yueh0Bir1qz94k4.png" alt="Total Record Realtime Format"/&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; These are two &lt;em&gt;Total Record&lt;/em&gt; in the Realtime System representing total clicks for a single user in two
separate hours. &lt;code&gt;u&lt;/code&gt; is our namespace for &amp;#8220;user clicks&amp;#8221;, &lt;code&gt;jehiah&lt;/code&gt; is my username, &lt;code&gt;c413&lt;/code&gt; and &lt;code&gt;c41l&lt;/code&gt; are time components
and these records have values of &lt;code&gt;2&lt;/code&gt; and &lt;code&gt;5&lt;/code&gt; respectively.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;u|jehiah.c413,2
u|jehiah.c41l,5
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Total Records - Archive System&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the Archive System, all &lt;em&gt;Total Records&lt;/em&gt; for the same &lt;code&gt;namespace&lt;/code&gt; + &lt;code&gt;total key&lt;/code&gt; combination are collapsed into one
record with the individual &lt;code&gt;time value&lt;/code&gt; and &lt;code&gt;counter value&lt;/code&gt; pairs as the multi-column component.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_m2yudxxere1qz94k4.png" alt="Total Record Archive Format"/&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; This the same two &lt;em&gt;Total Records&lt;/em&gt; from the Realtime System as stored in the Archive System. Here the key is
only the &lt;code&gt;u&lt;/code&gt; namespace and &lt;code&gt;jehiah&lt;/code&gt; total key, but the value is multi-column pairs of time components &lt;code&gt;c413&lt;/code&gt; and &lt;code&gt;c41l&lt;/code&gt;
and values of &lt;code&gt;2&lt;/code&gt; and &lt;code&gt;5&lt;/code&gt;. In this small example, the archive record is &lt;strong&gt;27%&lt;/strong&gt; more compact.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;u|jehiah,c413:2 c41l:5
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Subtotal Records - Realtime System&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A &lt;em&gt;Subtotal Record&lt;/em&gt; as stored in the Realtime System consists of a &lt;code&gt;subtotal namespace&lt;/code&gt;, a &lt;code&gt;total
namespace&lt;/code&gt;, a &lt;code&gt;total key&lt;/code&gt;, a &lt;code&gt;subtotal key&lt;/code&gt;, a hourly &lt;code&gt;time value&lt;/code&gt; and a &lt;code&gt;counter value&lt;/code&gt;. This record is structured such
that the (&lt;code&gt;total namespace&lt;/code&gt; + &lt;code&gt;total key&lt;/code&gt;) matches exactly to a &lt;em&gt;Total Record&lt;/em&gt;. Further the &lt;code&gt;count values&lt;/code&gt; are related
such that sum of &lt;code&gt;count values&lt;/code&gt; for all subtotal keys for a &lt;code&gt;subtotal namespace&lt;/code&gt;, &lt;code&gt;total namespace&lt;/code&gt;, &lt;code&gt;total key&lt;/code&gt;,
&lt;code&gt;time value&lt;/code&gt; exactly equals the matching value in a &lt;em&gt;Total Record&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_m2yudcKGDf1qz94k4.png" alt="Subtotal Record Realtime Format"/&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; Here are 4&amp;#160;&lt;em&gt;Subtotal Records&lt;/em&gt; that represent two different datasets related to the &lt;em&gt;Total Record&lt;/em&gt; above as
stored in the Realtime System. One dataset with the &lt;code&gt;c&lt;/code&gt; subtotal namespace represents the clicks by country, and the
other &lt;code&gt;r&lt;/code&gt; subtotal namespace represents the clicks by referrer. Each of these datasets (&lt;code&gt;4&lt;/code&gt; + &lt;code&gt;1&lt;/code&gt; and &lt;code&gt;3&lt;/code&gt; + &lt;code&gt;2&lt;/code&gt;) sum up
to the &lt;code&gt;5&lt;/code&gt; in the &lt;em&gt;Total Record&lt;/em&gt; above. The &lt;code&gt;US&lt;/code&gt; and &lt;code&gt;JP&lt;/code&gt; are country values in the subtotal key, and &lt;code&gt;5bmehgOB4+w=&lt;/code&gt; and
&lt;code&gt;h0aMB8AuNw4=&lt;/code&gt; are 12 character lookup values that are placeholders for longer subtotal keys. These records in the
Realtime System are per-hour so they can be incremented individually.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;c.u|jehiah.US.c41l,4
c.u|jehiah.JP.c41l,1
r.u|jehiah.5bmehgOB4+w=.c41l,3 
r.u|jehiah.h0aMB8AuNw4=.c41l,2
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;strong&gt;Subtotal Records - Archive System&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the Archive System all &lt;em&gt;Subtotal Records&lt;/em&gt; for a given hour for a &lt;code&gt;subtotal namespace&lt;/code&gt; collapse so that the &lt;code&gt;subtotal
key&lt;/code&gt; and &lt;code&gt;count values&lt;/code&gt; create the multi-column data.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_m2yucqHtVQ1qz94k4.png" alt="Subtotal Record Archive Format"/&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Example:&lt;/em&gt; The same 4 values in the Realtime System are represented as two multi-column records in the Archive System.
Here the records are still one per hour, but all subtotal keys for that hour are stored together. For this small
example, the archive format is &lt;strong&gt;31%&lt;/strong&gt; smaller.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;c.u|jehiah.c41l,US:4 JP:1
r.u|jehiah.c41l,5bmehgOB4+w=:3 h0aMB8AuNw4=:2
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;Record Visualization&lt;/h3&gt;

&lt;p&gt;This is a visualization of the data contained within the pair of &lt;em&gt;Total Record&lt;/em&gt; (green) and a &lt;em&gt;Subtotal Record&lt;/em&gt; (purple)
from the Archive System. On the right are the same records but with example values filled in.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_m2yubwPNbB1qz94k4.png" alt="Clickatron Record Format Visualization"/&gt;&lt;/p&gt;

&lt;h2&gt;Queries&lt;/h2&gt;

&lt;p&gt;Since Clickatron stores data with hourly granularity, the API merges that hourly data on the fly to fulfill requests for
other granularities. This approach enables the API to accept an &lt;code&gt;hour_offset&lt;/code&gt; parameter in order to fulfill queries for
any timezone aligned on the hour (with apologies to India and other locales aligned on the half hour). In our public API
endpoint we lookup &lt;code&gt;hour_offset&lt;/code&gt; from a more generic &lt;code&gt;timezone&lt;/code&gt; parameter. It also uses the technique for fulfilling day
queries to response to week, and month queries, and can even respond with week units that start on monday instead of
sunday (we call this &amp;#8220;mweek&amp;#8221;). You can see an example of these parameters in our &lt;a href="http://dev.bitly.com/link_metrics.html"&gt;API Documentation&lt;/a&gt;
for user and link metrics.&lt;/p&gt;

&lt;h2&gt;Performance in the Real World&lt;/h2&gt;

&lt;p&gt;Clickatron has been in production for over a year now. It has successfully replaced several other metrics systems,
reduced hardware requirements, improved data granularity, and reduced query latency. As previously mentioned, we have
leveraged its rebuild functionality to bulk load datasets with as many as 13 billion data points at once with zero
service impact - directly contributing to our ability to expand the breadth of our data. Despite an increase in the
breadth of metrics, the granularity of data storage, the timespan of accumulated data, and a significant increase in
traffic, Clickatron&amp;#8217;s current persistent footprint is less than one quarter the size of the disparate systems it
replaced a year ago. Clickatron currently tracks well over 100 billion data points, a number that increases every day,
and we feel well situated to handle future needs as they arise.&lt;/p&gt;

&lt;h3&gt;EOL&lt;/h3&gt;

&lt;p&gt;If this sounds like a fun metrics system to use or to work on, &lt;a href="http://bitly.com/jobs"&gt;bitly is hiring&lt;/a&gt;.&lt;/p&gt;

&lt;div class="postmeta"&gt;
by &lt;a href="http://twitter.com/jehiah"&gt;jehiah&lt;/a&gt; (also shout out to &lt;a href="http://twitter.com/imsnakes"&gt;snakes&lt;/a&gt; who helped design and develop clickatron)
&lt;/div&gt;</description><link>http://word.bitly.com/post/21721687297</link><guid>http://word.bitly.com/post/21721687297</guid><pubDate>Tue, 24 Apr 2012 13:19:16 -0400</pubDate><category>clickatron</category><category>metrics</category></item><item><title>sortdb - static key value database</title><description>&lt;p&gt;As dataset sizes grow, you need to buy larger/faster database machines, or shard data across multiple machines.
&lt;a href="https://github.com/bitly/simplehttp/tree/master/sortdb"&gt;sortdb&lt;/a&gt; is a database server we have written to help fill a specific need for performant access to static data.&lt;/p&gt;

&lt;p&gt;One observation about many datasets is that there is often a small dataset that is heavily updated, and a larger older
related dataset that does not change. It is this second case where sortdb can be used to expose a query interface to
static datasets.&lt;/p&gt;

&lt;p&gt;sortdb has a HTTP interface built on libevent, and exposes &lt;code&gt;get&lt;/code&gt;, &lt;code&gt;mget&lt;/code&gt; and &lt;code&gt;fwmatch&lt;/code&gt; endpoints to key / value data in
a sorted &lt;code&gt;csv&lt;/code&gt; data file. Internally, the datafile is &lt;a href="http://en.wikipedia.org/wiki/Mmap"&gt;mmaped&lt;/a&gt; and a binary search is
performed to find rows in the file. It is part of
&lt;a href="http://word.bitly.com/post/13216970565/intro-to-simplehttp"&gt;simplehttp&lt;/a&gt;, a family of libraries and daemons for building
scalable web infrastructures.&lt;/p&gt;

&lt;p&gt;By mmap&amp;#8217;ing the data file sortdb allows the operating system to handle maintaining the most important parts of the data
in memory while retrieving others from disk transparently. Since the file is searched using a binary search, it means
the first few common steps of the search tree are quickly cached in ram and execute very quickly. The operating system
will also share mmap&amp;#8217;d data between multiple processes. Practically, this means you can start a second sortdb process
pointed at the same data file for higher read throughput.&lt;/p&gt;

&lt;p&gt;In database terms, a sorted key/value datafile is essentially a covering index where all the values are stored with the
key. Because there is no need to store a separate index of the data, this is a compact on-disk representation. (removing
the need for separate indexes can often give a 20~30% savings in disk usage).&lt;/p&gt;

&lt;p&gt;Operationally managing sortdb is easy because the data files it uses are static and read-only. This makes it easy to
copy the file to multiple servers and distribute requests to multiple sortdb instances on separate servers to handle a
higher workload, or to add redundancy. It is also possible to point sortdb to a new file with zero downtime (literally
&lt;code&gt;mv data.csv old.csv &amp;amp;&amp;amp; mv new.csv data.csv &amp;amp;&amp;amp; kill -s HUP $pid&lt;/code&gt;).&lt;/p&gt;

&lt;h2&gt;Using sortdb&lt;/h2&gt;

&lt;p&gt;At bitly we use sortdb successfully as a core component of our metrics system and are very happy with it&amp;#8217;s performance
characteristics. Our long term metrics system is composed of two different storage systems. One &amp;#8220;realtime&amp;#8221; system stores
metrics for the past 48 hours and processes all increment requests. A second &amp;#8220;static&amp;#8221; storage system is built on sortdb
and stores all data prior to the past 48 hours. Every day, 24 hours of data is exported from the realtime system, and
new static data files are re-built for the static system comprising all the existing data, and the new 24 hour data
segment.&lt;/p&gt;

&lt;p&gt;A small slice of our &amp;#8220;static&amp;#8221; metrics data looks like this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;c.u|jehiah.c3h5,US:2
c.u|jehiah.c3h7,US:1
c.u|jehiah.c3h9,None:2 US:3
c.u|jehiah.c3ha,None:2
c.u|jehiah.c3hb,None:2 US:3
c.u|jehiah.c3hc,None:3 US:4
c.u|jehiah.c3hd,None:5 US:4
c.u|jehiah.c3he,None:2 US:6
c.u|jehiah.c3hf,None:3 US:5
c.u|jehiah.c3hg,None:4 US:5
c.u|jehiah.c3hh,DE:1 None:5 US:7
c.u|jehiah.c3hi,CA:1 None:6 US:5
u|jehiah.c3hh,13
u|jehiah.c3i0,7
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You will notice that these are &lt;code&gt;key,value&lt;/code&gt; records where the key is a compound key often made up of a namespace and
sometimes a time component, and the value is sometimes a repeating (&lt;code&gt;key&lt;/code&gt; + &lt;code&gt;:&lt;/code&gt; + &lt;code&gt;value&lt;/code&gt;) sequence.&lt;/p&gt;

&lt;p&gt;With that dataset in plain text file named &lt;code&gt;data.csv&lt;/code&gt;, you can point sortdb at it with this run command (setting comma
as the field separator)&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ sortdb --field-separator=, --db-file=data.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now it&amp;#8217;s possible to query for a single record&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ curl 'http://127.0.0.1:8080/get?key=u|jehiah.c3hh'
13
$ curl 'http://127.0.0.1:8080/get?key=c.u|jehiah.c3hh'
DE:1 None:5 US:7
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or forward match a range of records&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ curl 'http://127.0.0.1:8080/fwmatch?key=u|jehiah.'
u|jehiah.c3hh,13
u|jehiah.c3i0,7
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;EOL&lt;/h3&gt;

&lt;p&gt;If this sound like a fun project to hack on, &lt;a href="http://bit.ly/jobs"&gt;bitly is hiring&lt;/a&gt;.&lt;/p&gt;

&lt;div class="postmeta"&gt;
by &lt;a href="http://twitter.com/jehiah"&gt;jehiah&lt;/a&gt;
&lt;/div&gt;</description><link>http://word.bitly.com/post/20350137230</link><guid>http://word.bitly.com/post/20350137230</guid><pubDate>Mon, 02 Apr 2012 10:59:31 -0400</pubDate><category>sortdb</category><category>simplehttp</category></item><item><title>Infrastructure as a Platform</title><description>&lt;h3&gt;Introduction&lt;/h3&gt;

&lt;p&gt;At bitly, infrastructure has two core responsibilities.&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;&lt;strong&gt;The obvious&lt;/strong&gt; - systems architecture, performance, scaling, technology choices, and implementation details.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The not-so-obvious&lt;/strong&gt; - new hire on-boarding, developer tools, and idioms that enable non-infrastructure engineers (science/research, application developers, etc.) to build scalable, performant, operationally sound, and easy-to-develop pieces of your product.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;I say not-so-obvious because many of these things aren&amp;#8217;t pressing issues when teams are small, everyone is in one room, and it&amp;#8217;s crystal clear what the most important goal is from an engineering perspective.&lt;/p&gt;

&lt;p&gt;As an engineering team grows, it becomes more and more important to develop and evangelize a common way of solving problems and getting things done.  Without these guidelines you end up with inconsistent solutions that have a dramatic impact on the ability to operationally handle production systems (let alone the time cost of engineers constantly re-inventing the wheel).&lt;/p&gt;

&lt;p&gt;Maybe you&amp;#8217;re responsible 24/7 for production systems - consider any combination of the following situations:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;logs are in 14 different places&lt;/li&gt;
&lt;li&gt;an engineer chose a different one of the far too many Python web frameworks&lt;/li&gt;
&lt;li&gt;services are started by the current flavor-of-the-week process manager&lt;/li&gt;
&lt;li&gt;response formats from APIs differ&lt;/li&gt;
&lt;li&gt;techniques used to process data vary&lt;/li&gt;
&lt;li&gt;monitoring is done differently or not at all&lt;/li&gt;
&lt;li&gt;backups aren&amp;#8217;t consistent&lt;/li&gt;
&lt;li&gt;no tests&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Sounds like one helluva mess to debug.  (and it&amp;#8217;s &lt;strong&gt;always&lt;/strong&gt; at &lt;em&gt;3am&lt;/em&gt;)&lt;/p&gt;

&lt;p&gt;Let&amp;#8217;s talk about how we&amp;#8217;ve made progress over the past year solving some of these problems.&lt;/p&gt;

&lt;h3&gt;Development Workflow&lt;/h3&gt;

&lt;p&gt;A big win here was our switch to Git (and GitHub).  The freedom and flexibility this gave us to design a powerful code-flow (&lt;code&gt;develop&lt;/code&gt; -&amp;gt; &lt;code&gt;review&lt;/code&gt; -&amp;gt; &lt;code&gt;merge&lt;/code&gt; -&amp;gt; &lt;code&gt;deploy&lt;/code&gt;) has proved &lt;em&gt;extremely&lt;/em&gt; valuable.&lt;/p&gt;

&lt;p&gt;We keep our code in one repository (application code, infrastructure tools, configuration and system dependencies).  When possible we work to move things into external &lt;a href="http://github.com/bitly"&gt;open source projects&lt;/a&gt; that are paired with an install script in our main repo.  Every engineer develops in their own fork.  Code makes it into &lt;code&gt;master&lt;/code&gt; in one of two ways:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;&lt;strong&gt;cherry-picking&lt;/strong&gt; - a small, well defined changeset or high-priority bug fix can be cherry-picked into &lt;code&gt;master&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pull request&lt;/strong&gt; - anything else follows the process of creating a branch off &lt;code&gt;master&lt;/code&gt;, developing the feature, and opening a pull request.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;It should be noted that another member of the engineering team will review either of the above and provide constructive feedback, ultimately being the one to bring the code into master.  Code review is a first class member of our development workflow - it is hugely beneficial to both the developer and the reviewer.  The better you are at reading code, the better you are at writing it.  For efficiency, and to respect each others time, we often exchange commit hashes and play tit-for-tat with pull-request reviewing.&lt;/p&gt;

&lt;p&gt;Another benefit of this model is that &lt;code&gt;master&lt;/code&gt; maintains a clean history, always moving forward.  In doing so we avoid the pitfalls of coordinating a destructive history change with so many outstanding branches.  Still, we understand the value of Git&amp;#8217;s power - engineers are encouraged to re-write history to their hearts content while developing in their own branch.  When ready, a comment on the pull request stating &amp;#8220;ready for review &lt;strong&gt;@github_user&lt;/strong&gt;&amp;#8221; will notify the corresponding person to begin review at their discretion.&lt;/p&gt;

&lt;p&gt;Reviewing takes the form of comments on the pull request.  As each round of review is completed, the developer will comment &amp;#8220;resolved&amp;#8221; where appropriate and push additional commits to the branch.  This culminates in one final rebase and squash before a merge.  We find that looking back you rarely need the back-and-forth of many small commits and we prefer to see the change as a whole as it relates to the issue documented in the pull request.&lt;/p&gt;

&lt;h3&gt;Developing in a Virtual Machine&lt;/h3&gt;

&lt;p&gt;If you&amp;#8217;re &lt;strong&gt;not&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;writing code in an environment that mimics production&lt;/li&gt;
&lt;li&gt;setup by the same scripts and processes that stand up production hosts&lt;/li&gt;
&lt;li&gt;exposed to the challenges of making things play nice together&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;As soon as your code is actually running in production it&amp;#8217;s probably not going to go well.&lt;/p&gt;

&lt;p&gt;We believe whole-heartedly in the VM approach.  Every developer (and even designers!) have a local VM for developing changes to any component in our infrastructure, front to back.&lt;/p&gt;

&lt;p&gt;Most importantly, it forces you to design your applications to be &amp;#8220;environment aware&amp;#8221;.  Your application code should only know the &lt;strong&gt;&lt;em&gt;how&lt;/em&gt;&lt;/strong&gt;.  The &lt;strong&gt;&lt;em&gt;where&lt;/em&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;em&gt;how many&lt;/em&gt;&lt;/strong&gt; is the responsibility of configuration files that pivot on the &lt;em&gt;environment&lt;/em&gt; the code is running in.  (see &lt;a href="http://twitter.com/jehiah"&gt;@jehiah&lt;/a&gt;&amp;#8217;s &lt;a href="http://jehiah.cz/a/application-settings"&gt;post&lt;/a&gt; for more info about how we structure this)&lt;/p&gt;

&lt;p&gt;The benefits of this are easy to understand - if you can get it working, tested, and deployable in the VM then getting it running successfully in production should be as simple as a deploy.&lt;/p&gt;

&lt;p&gt;However, the choice of using a VM doesn&amp;#8217;t come without challenges:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;When you&amp;#8217;re service-oriented, as the overall architecture grows, it becomes increasingly difficult to &amp;#8220;fit&amp;#8221; &lt;em&gt;all&lt;/em&gt; services in a single VM on your local machine (RAM, etc.).&lt;/li&gt;
&lt;li&gt;Not &lt;em&gt;all&lt;/em&gt; services are equal and different engineers tend to touch different components more often.&lt;/li&gt;
&lt;li&gt;Setup, maintenance, and learning curve are a time sink.&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;We&amp;#8217;ve only begun to scratch the surface in addressing these issues.   We&amp;#8217;ve begun to logically group services so that they can be spun up/down on demand, giving some control over whats running simultaneously.  We&amp;#8217;ll surely post a follow up with our findings as we continue to make progress in this area.&lt;/p&gt;

&lt;h3&gt;The Way of the Bitly&lt;/h3&gt;

&lt;p&gt;The value of choosing sane and repeatable solutions to common problems becomes clearer and clearer as things grow - whether it&amp;#8217;s data, engineering team size, or number of services in your infrastructure.&lt;/p&gt;

&lt;p&gt;Often the # of operations engineers doesn&amp;#8217;t scale linearly with the # of services they have to monitor in production.  It becomes extremely important to make their lives easy by developing good frameworks and solutions for solving problems, and make those solutions readily available, well documented, and easy to use.&lt;/p&gt;

&lt;p&gt;We begin this education process early on with new engineers.&lt;/p&gt;

&lt;p&gt;An engineer&amp;#8217;s first day sets the tone for his future at your company.  A new engineer at bitly can expect that on the first day they&amp;#8217;ll have some shiny new hardware to play with as well as one simple goal, get your bio on the &lt;a href="http://bitly.com/pages/about"&gt;bitly about page&lt;/a&gt;.  Since we put a strong focus on working in a VM, that implies that the existing operations and infrastructure team has done their job to have all the appropriate accounts setup such as access to email, wiki, GitHub, and of course, a VM.&lt;/p&gt;

&lt;p&gt;This simple task facilitates the process of familiarizing themselves with the dev workflow described above as well as providing the satisfaction of seeing code go into production on day 1.&lt;/p&gt;

&lt;p&gt;We try to expose new engineers to all different facets of bitly infrastructure and encourage them to explore the codebase.  Fixing bugs, adding small pieces of functionality, and developing brand new services are common first week tasks.&lt;/p&gt;

&lt;p&gt;We think it&amp;#8217;s important to step away from the computer, too.  We schedule internal tech talks where we discuss the successes (and challenges) of the current state of the infrastructure.  It&amp;#8217;s an open forum where questions are answered, odd names get definitions, and bitly idioms are discussed.&lt;/p&gt;

&lt;h3&gt;EOL&lt;/h3&gt;

&lt;p&gt;There&amp;#8217;s lots more to talk about:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;how we deploy&amp;#8230;&lt;/li&gt;
&lt;li&gt;what are some of those bitly idioms&amp;#8230;&lt;/li&gt;
&lt;li&gt;how we make technology decisions&amp;#8230;&lt;/li&gt;
&lt;li&gt;meetings and communication&amp;#8230;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;and more.  We&amp;#8217;ll save all that for future posts.&lt;/p&gt;

&lt;p&gt;As always, if any of this sounds interesting to you, &lt;a href="https://bitly.com/jobs"&gt;bitly is hiring&lt;/a&gt;.&lt;/p&gt;

&lt;div class="postmeta"&gt;
by &lt;a href="http://twitter.com/imsnakes"&gt;snakes&lt;/a&gt;
&lt;/div&gt;</description><link>http://word.bitly.com/post/19183796149</link><guid>http://word.bitly.com/post/19183796149</guid><pubDate>Mon, 12 Mar 2012 12:52:00 -0400</pubDate><category>infrastructure</category><category>bitly</category><category>git</category><category>github</category><category>engineering</category></item><item><title>Introducing Asyncdynamo</title><description>&lt;p&gt;When Amazon announced its DynamoDB service in late January, we quickly identified it as a promising candidate to meet many of our database demands: high availability and fault tolerance, low latency, and low maintenance. SSD hardware promised to grant us similar performance to what we experience with an in-memory datastore, while replication handled by Amazon promised to let our ops team sleep soundly at night (a cranky ops team is never a good thing). While it would be impractical for us to move all of our core infrastructure over to hosted services, we felt that Dynamo could be a good persistence option for applications already deployed to EC2.&lt;/p&gt;

&lt;p&gt;Since all interactions with Dynamo are carried out over HTTP, there’s no real need for a custom client library to begin performing database options. Our applications are written primarily in Python, and &lt;a href="http://github.com/boto/boto"&gt;Boto&lt;/a&gt; provided just enough helper methods to handle authentication and proper request formatting (neither of which are very straightforward tasks). However, Boto’s network calls are executed using Python’s built in &lt;code&gt;httplib&lt;/code&gt;, whose blocking nature makes it impractical for use with &lt;a href="http://github.com/facebook/tornado"&gt;Tornado&lt;/a&gt;, an asynchronous framework.&lt;/p&gt;

&lt;p&gt;Consequently, we set out to develop a library that would make use of Boto&amp;#8217;s facilities for Dynamo-specific operations (request formatting and signing, as well as response parsing) but would leverage Tornado&amp;#8217;s async HTTP client to execute the actual requests. For the benefit of any other Tornado users hoping to make use of DynamoDB, we are proud to announce &lt;a href="http://github.com/bitly/asyncdynamo"&gt;Asyncdynamo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Asyncdynamo requires Boto and Tornado to be installed, and must be run with Python 2.7. It replaces Boto&amp;#8217;s synchronous calls to Dynamo and to Amazon STS (to retrieve session tokens) with non-blocking Tornado calls. For the end user its interface seeks to mimic that of Boto Layer1, with each method now requiring an additional callback parameter.&lt;/p&gt;

&lt;p&gt;A little trickery was involved in making this work: Boto&amp;#8217;s methods to sign requests expect to be operating on an instance of &lt;code&gt;boto.connection.HTTPRequest&lt;/code&gt;, so we needed to trick them into accepting our &lt;code&gt;tornado.httpclient.HTTPRequest&lt;/code&gt;. Working with a dynamic language makes this kind of type-hijacking much easier than it might otherwise be, and after a short bit of trial and error we were able to successfully fire off our hybrid requests to Amazon.&lt;/p&gt;

&lt;p&gt;Currently, Boto Layer1 equivalents of the essential operations - Get, Put, and Query - have all been implemented, and any other interaction with Dynamo is possible using a generic &lt;code&gt;make_request&lt;/code&gt; method. Once initialized with your AWS keys, Asyncdynamo will handle all authentication and session token management behind the scenes. Our immediate development plans are to fully replicate the methods offered in Boto Layer1, and in the longer term we hope to be able to add a further layer of abstraction similar to Boto Layer2.&lt;/p&gt;

&lt;p&gt;Interested in working on these kinds of projects? We&amp;#8217;re &lt;a href="http://bit.ly/jobs"&gt;hiring&lt;/a&gt;. We&amp;#8217;ll also be at PyCon this week if you&amp;#8217;d like to chat about this project, small links, big data, or anything in between.&lt;/p&gt;

&lt;div class="postmeta"&gt;
by &lt;a href="http://twitter.com/danielhfrank"&gt;danielhfrank&lt;/a&gt;
&lt;/div&gt;</description><link>http://word.bitly.com/post/18861837158</link><guid>http://word.bitly.com/post/18861837158</guid><pubDate>Tue, 06 Mar 2012 16:20:02 -0500</pubDate></item><item><title>Use Amazon SNS for Nagios Alerts</title><description>&lt;p&gt;&lt;span&gt; &lt;/span&gt;Amazon recently added &lt;a href="https://forums.aws.amazon.com/ann.jspa?annID=1230" target="_self"&gt;SMS publishing capability&lt;/a&gt; to their Simple Notification Service(SNS) platform. &lt;/p&gt;



&lt;p&gt;&lt;span&gt; &lt;/span&gt;&lt;/p&gt;



&lt;p&gt;&lt;strong&gt;Using SNS to send nagios alerts:&lt;/strong&gt;&lt;/p&gt;



&lt;p&gt;&lt;span&gt; &lt;/span&gt;At Bitly, we used individual carrier’s email SMS gateways (eg. 123456789@text.att.net) to send pages to the on-call person as well as other people on the ops team. This solution was not cutting it as messages were heavily throttled on the carrier’s side (the gateway being a free service provided by the carrier meant that we couldn’t really complain or get the problem rectified). This basically meant that pages arrived hours later in some cases and defeated the purpose of an alert during an emergency. &lt;/p&gt;



&lt;p&gt;Once SMS capability was announced, it was pretty much a no-brainer to switch to SNS.&lt;/p&gt;



&lt;p&gt;I wrote a quick and simple python script with the help of a popular AWS python library called &lt;a href="http://code.google.com/p/boto/"&gt;boto&lt;/a&gt;. You can find the source over at &lt;a href="http://bit.ly/pyamazonsns"&gt;&lt;a href="http://bit.ly/pyamazonsns"&gt;http://bit.ly/pyamazonsns&lt;/a&gt;&lt;/a&gt;.&lt;/p&gt;



&lt;p&gt;This script can be used to send any kind of SNS message, not just SMS.&lt;/p&gt;



&lt;p&gt;Here is a snippet of how this would tie into nagios:&lt;/p&gt;



&lt;pre&gt;
&lt;code&gt;&#13;
define command{&#13;
    command_name  notify-host-by-txt&#13;
    command_line  printf "%b" "$HOSTALIAS$" | send_sns.py $CONTACTPAGER$
}
&lt;/code&gt;
&lt;/pre&gt;

&lt;p&gt;&#13;
&#13;&lt;/p&gt;

&lt;p&gt;Feel free to fork it, improve on it and add features beyond the scope of our usage. Drop us a line to let us know :)&lt;/p&gt;



&lt;div class="postmeta"&gt;by &lt;a href="http://twitter.com/sricola"&gt;sri&lt;/a&gt;&lt;/div&gt;</description><link>http://word.bitly.com/post/15571216901</link><guid>http://word.bitly.com/post/15571216901</guid><pubDate>Mon, 09 Jan 2012 12:13:00 -0500</pubDate></item><item><title>Introduction to simplehttp</title><description>&lt;p&gt;Part of our engineering philosophy is to keep things fast and simple. Aim to serve one purpose and serve it well. Speak HTTP and encode in JSON. Prototype in Python and speed it up in C.&lt;/p&gt;

&lt;!--more--&gt;

&lt;p&gt;There are a few components that follow these tenets and sit at the core of our infrastructure. We&amp;#8217;ve open-sourced them under the &lt;a href="https://github.com/bitly/simplehttp"&gt;simplehttp&lt;/a&gt; moniker. They serve as the the architectural foundation for higher-order functionality.&lt;/p&gt;

&lt;h2&gt;simplehttp&lt;/h2&gt;

&lt;p&gt;At the lowest level is the &lt;code&gt;simplehttp&lt;/code&gt; library, an abstraction of &lt;a href="http://monkey.org/~provos/libevent/"&gt;libevent&lt;/a&gt;&amp;#8217;s evhttp functions, aimed at trivializing the task of writing an evented HTTP server in C. It&amp;#8217;s dead simple yet provides high-level features such as:&lt;/p&gt;

&lt;ul&gt;&lt;li&gt;&lt;a href="https://github.com/bitly/simplehttp/blob/master/simplehttp/options.c"&gt;Tornado inspired options parsing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/bitly/simplehttp/blob/master/simplehttp/log.c"&gt;Tornado inspired logging&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Automatic per-endpoint &lt;a href="https://github.com/bitly/simplehttp/blob/master/simplehttp/stat.c"&gt;stat tracking&lt;/a&gt; (request counts, 95% times, averages)&lt;/li&gt;
&lt;li&gt;Clean API to perform &lt;a href="https://github.com/bitly/simplehttp/blob/master/simplehttp/async_simplehttp.c"&gt;async HTTP requests&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;p&gt;Built on top of &lt;code&gt;simplehttp&lt;/code&gt;, perhaps the most important daemons are &lt;code&gt;simplequeue&lt;/code&gt; and &lt;code&gt;pubsub&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;simplequeue&lt;/h3&gt;

&lt;p&gt;A rock-solid in-memory message queue for arbitrary message bodies (we use JSON), providing basic &lt;code&gt;/get&lt;/code&gt; and &lt;code&gt;/put&lt;/code&gt; endpoints. We use this in several key areas to serve as a work queue for asynchronous processing.  During a maintenance window or situations where backend services are degraded it also acts as a buffer, queueing up work that needs to be done when backend services are restored.&lt;/p&gt;

&lt;p&gt;We use long-lived Python &amp;#8220;queuereaders&amp;#8221; to poll the &lt;code&gt;simplequeue&lt;/code&gt; and perform work.  This work might be writing to a database, logging, aggregating, or anything else that you might not want to perform in a blocking fashion during a request cycle.  These queuereaders have built-in backoff timers which slow down the processing rate when errors are detected to allow a struggling backend to recover gracefully and to reduce the load on the machine running the queuereader.&lt;/p&gt;

&lt;p&gt;Generally, we silo a &lt;code&gt;simplequeue&lt;/code&gt; and its associated queuereaders on each host of the service.  Meaning a &lt;code&gt;simplequeue&lt;/code&gt; on &lt;code&gt;hostA&lt;/code&gt; will only contain messages from requests received by that host and its queuereaders will only process messages from its local &lt;code&gt;simplequeue&lt;/code&gt;.  We do this to address single point of failure issues.&lt;/p&gt;

&lt;h3&gt;pubsub&lt;/h3&gt;

&lt;p&gt;We have many different types of data at bitly, each classified into a stream. There are streams of &lt;strong&gt;encodes&lt;/strong&gt; (shortens), &lt;strong&gt;decodes&lt;/strong&gt; (&amp;#8220;clicks&amp;#8221;), &lt;strong&gt;user events&lt;/strong&gt;, etc. In order to provide a central, consistent, means for developers to access data in realtime we expose these streams via &lt;code&gt;pubsub&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Publishing a message is a simple HTTP request to the &lt;code&gt;/pub&lt;/code&gt; endpoint, in our case this usually happens in an queuereader who&amp;#8217;s sole purpose is to read off a specific &lt;code&gt;simplequeue&lt;/code&gt; and write to a specific &lt;code&gt;pubsub&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A client consuming the stream is a a long-lived HTTP request to the &lt;code&gt;/sub&lt;/code&gt; endpoint.  Messages are transmitted as newline deliminated JSON.&lt;/p&gt;

&lt;p&gt;To pair with &lt;code&gt;pubsub&lt;/code&gt; the repository contains three additional utilities built on &lt;code&gt;pubsubclient&lt;/code&gt;. &lt;code&gt;ps_to_file&lt;/code&gt; and &lt;code&gt;ps_to_http&lt;/code&gt; are fairly self-explanatory. One archives a &lt;code&gt;pubsub&lt;/code&gt; stream to a file (automatically rolling the output files for you based on a configurable strftime format string), and the other writes a stream of data to destination HTTP endpoints. The latter can be used to send messages to a &lt;code&gt;simplequeue&lt;/code&gt;, another &lt;code&gt;pubsub&lt;/code&gt; stream, or any other HTTP endpoint. Additionally, &lt;code&gt;pubsub_filtered&lt;/code&gt; repeats a pubsub stream and provides the option to remove or obfuscate fields, creating a filtered view on a subset of the data.  At bitly we use these tools to archive our data streams and to pass data published by one application into another application (or another datacenter).&lt;/p&gt;

&lt;h3&gt;EOL&lt;/h3&gt;

&lt;p&gt;If any of these things sound like fun projects to hack on, &lt;a href="http://bit.ly/jobs"&gt;bitly is hiring&lt;/a&gt;.&lt;/p&gt;

&lt;div class="postmeta"&gt;
by &lt;a href="http://twitter.com/imsnakes"&gt;snakes&lt;/a&gt;
&lt;/div&gt;</description><link>http://word.bitly.com/post/13216970565</link><guid>http://word.bitly.com/post/13216970565</guid><pubDate>Thu, 25 Aug 2011 09:00:00 -0400</pubDate></item><item><title>Welcome to the bitly Engineering Blog</title><description>&lt;p&gt;At &lt;a href="http://bitly.com/"&gt;bitly&lt;/a&gt;, we have been happy to 
contribute to a number of open source projects, and we have even started
&lt;a href="https://github.com/bitly/"&gt;a few of our own&lt;/a&gt;. We look forward to talking
about those, and other engineering details here. You can stay up-to-date by following &lt;a href="http://twitter.com/bitly"&gt;@bitly&lt;/a&gt;&lt;/p&gt;

&lt;div class="postmeta"&gt;
by &lt;a href="http://twitter.com/jehiah"&gt;jehiah&lt;/a&gt;
&lt;/div&gt;</description><link>http://word.bitly.com/post/13216967883</link><guid>http://word.bitly.com/post/13216967883</guid><pubDate>Mon, 22 Aug 2011 15:28:00 -0400</pubDate></item></channel></rss>

