Loading…
Thursday, May 15 • 2:20pm - 3:00pm
Elastic Recheck - Tools for Finding Race Conditions in OpenStack

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!


During the Havana release cycle we discovered Tempest was
getting comprehensive enough that it would expose interesting
timing problems in OpenStack in the OpenStack gate. Developers were
used to calling these flakey tests" and ignoring the negative
results, however we saw a pattern emerge where the same pattern for
a fail could be seen multiple times.
These "statistical failures", where a give scenario will fail 1%
of the time, become real issues when you end up with 60+ of them in
the code base, and when you create 30,000 clouds per week.
We believed we had a couple of interesting race conditions to nail
down, and started building a system based on Elastic Search to be
able to automatically identify these things. This system first
started reporting back data to developers at the very end of the
Havana release.
This talk will discuss the whole problem space of finding low
percentage failures in the code base. The toolchain we build upon
for the problem, and the fingerprint and reporting approach that we
use to help the OpenStack development community prioritize these
issues, how this is informing our thinking about the OpenStack
gating system, and where we are headed in the future. Because this
whole toolchain is OpenSource it's something we expect others might
extend to their own projects as well.
"

Speakers
avatar for Sean Dague

Sean Dague

Software Engineer, IBM
Sean Dague has been an Open Source developer for most of his professional life. He's worked on numerous Open Source projects over the years including SystemImager, OpenHPI, Xen, OpenSim, NFS Ganesha, and OpenStack. He's a core reviewer on Nova, Tempest, Devstack, Grenade, and lots... Read More →


Thursday May 15, 2014 2:20pm - 3:00pm EDT
Room B206

Attendees (0)