Thursday, May 15 • 2:20pm - 3:00pm
Elastic Recheck - Tools for Finding Race Conditions in OpenStack

During the Havana release cycle we discovered Tempest was
getting comprehensive enough that it would expose interesting
timing problems in OpenStack in the OpenStack gate. Developers were
used to calling these flakey tests" and ignoring the negative
results, however we saw a pattern emerge where the same pattern for
a fail could be seen multiple times.
These "statistical failures", where a give scenario will fail 1%
of the time, become real issues when you end up with 60+ of them in
the code base, and when you create 30,000 clouds per week.
We believed we had a couple of interesting race conditions to nail
down, and started building a system based on Elastic Search to be
able to automatically identify these things. This system first
started reporting back data to developers at the very end of the
Havana release.
This talk will discuss the whole problem space of finding low
percentage failures in the code base. The toolchain we build upon
for the problem, and the fingerprint and reporting approach that we
use to help the OpenStack development community prioritize these
issues, how this is informing our thinking about the OpenStack
gating system, and where we are headed in the future. Because this
whole toolchain is OpenSource it's something we expect others might
extend to their own projects as well.

avatar for Sean Dague

Sean Dague

Software Engineer, HP
Sean Dague has been an Open Source developer for most of his professional life. He's part of the HP OpenStack team working to make OpenStack better, contributing to Nova, Devstack, Tempest, and the OpenStack Infrastructure. He created the Mid Hudson Valley Linux Users Group a dozen years ago exposing hundreds to the joys of Open Source.

Thursday May 15, 2014 2:20pm - 3:00pm
Room B206

