Classifying Human Traffic with Random Forest Decision Trees

At bitly, we study human behavior on the social web, and we often need to figure out when data is generated by a deliberate human action (organic data) or by an action taken by a script or without a human’s knowledge (inorganic data). bitly data scientist Brian Eoff recently gave a talk at PyData NYC 2012 on a fast random forest decision tree approach, implemented in Python with scikits-learn, to identifying organic vs inorganic data in a realtime stream.

SciKit Random Forest - Brian Eoff from Continuum Analytics on Vimeo.