Python
Unclog the tubes; blocking detection in Eventlet
August 28, 2010 - 2:12pm | by R. Tyler CroyColleagues of mine are all very familiar with my admiration of Eventlet, a
Python concurrency library, built on top of greenlet, that
provides lightweight "greenthreads" that naturally yield around I/O points. For me, the biggest draw of Eventlet
besides its maturity, is how well it integrates with standard Python code. Any code that uses the built-in
socket module can be "monkey-patched" (i.e. modified at runtime) to use the "green" version of the socket
module which allows Eventlet to turn regular ol' Python into code with asynchronous I/O.
The problem with using libraries like Eventlet, is that some Python code just blocks, meaning that code will hit an I/O point and not yield but instead block the entire process until that network operation completes.
In practical terms, imagine you have a web crawler that uses 10 "green threads", each crawling a different site. The first greenthread (GT1) will send an HTTP request to the first site, then it will yield to GT2 and so on. If each HTTP request blocks for 100ms, that means when crawling the 10 sites, you're going to block the whole process, preventing anything from running, for a whole second. Doesn't sound too terrible, but imagine you've got 1000 greenthreads, instead of everything smoothly yielding from one thread to another the process will lock up very often resulting in painful slowdowns.
Starting with Eventlet 0.9.10 "blocking detection" code has been incorporated into Eventlet to make it far easier for developers to find these portions of code that can block the entire process.
import eventlet.debug eventlet.debug.hub_blocking_detection(True)
While using the blocking detection is fairly simple, its implementation is a bit "magical" in that it's not entirely obvious how it works.
Being a Libor, Addendum
May 18, 2010 - 8:00am | by R. Tyler CroyA couple of weeks ago I wrote a post on how to "Be a Libor", trying to codify a few points I feel like I learned about building a successful engineering team at Slide. Shortly after the post went live, I discovered that Libor had been promoted to CTO at Slide.
Over coffee today Libor offered up some finer points on the post in our discussion about building teams. It is important, according to Libor, to maintain a "mental framework" within which the stack fits; guiding decisions with a consistent world-view or ethos about building on top of the foundation laid. This is not to say that you should solve all problems with the same hammer, but rather if the standard operating procedure is to build small single-purpose utilities, you should not attack a new problem with a giant monolithic uber-application that does thirty different things (hyperbole alert!).
Libor also had a fantastic quote from the conversation with regards to approaching new problems:
Just because there are multiple right answers, doesn't mean there's no wrong answers
Depending on the complexity of the problems you're facing there are likely a number of solutions but you still can get it wrong, particularly if you don't remain consistent with your underlying mental framework for the project/organization.
As usual my discussions with Libor are interesting and enjoyable, he's one of the most capable, thoughtful engineers I know, so I'm interested to see the how Slide Engineering progresses under his careful hand as the new CTO. I hope you join me in wishing him the best of luck in his role, moving from wrangling coroutines, to herding cats.
Is programming with Twisted really as awful as it sounds?
May 12, 2010 - 8:45am | by R. Tyler CroyEarly this week Can forwarded this post on Quora to me, which asks the question:
Is programming with Twisted really as awful as it sounds?
Yes. Yes. YES IT IS. HOLY CRAP IT'S AWFUL
Here's some good alternatives:
- Eventlet, my preference
- gevent, an alternative to Eventlet tied to libevent
- Java. because let's face it, if you're using Twisted, you've already decided not to write Python, so use something with proper threading support.
That is all.
How-to: Using Avro with Eventlet
May 7, 2010 - 8:45am | by R. Tyler CroyWorking on the plumbing behind a sufficiently large web application I find myself building services to meet my needs more often than not. Typically I try to build single-purpose services, following in the unix philosophy, cobbling together more complex tools based on a collection of distinct building blocks. In order to connect these services a solid, fast and easy-to-use RPC library is a requirement; enter Avro.
Note: You can skip ahead and just start reading some source code by cloning my eventlet-avro-example repository from GitHub.
Avro is part of the Hadoop project and has two primary components, data serialization and RPC support. Some time ago I chose Avro for serializing all of Apture's metrics and logging information, giving us a standardized framework for recording new events and processing them after the fact. It was not until recently I started to take advantage of Avro's RPC support when building services with Eventlet. I've talked about Eventlet before, but to recap:
Eventlet is a concurrent networking library for Python that allows you to change how you run your code, not how you write it
What this means in practice is that you can write highly concurrent network-based services while keeping the code "synchronous" and easy to follow. Underneath Eventlet is the "greenlet" library which implements coroutines for Python, which allows Eventlet to switch between coroutines, or "green threads" whenever a network call blocks.
Eventlet meets Avro RPC in an unlikely (in my opinion) place: WSGI. Instead of building their own transport layer for RPC calls, Avro sits on top of HTTP for its transport layer, POST'ing binary data to the server and processing the response. Since Avro can sit on top of HTTP, we can use eventlet.wsgi for building a fast, simple RPC server.
Be a Libor
April 30, 2010 - 6:45am | by R. Tyler CroyI reflect occasionally on how I've gotten to where I am right now, specifically to how I made the jump from "just some kid at a Piggly Wiggly in Texas" as Dave once said, to the guy who knows stuff about things. I often think about what pieces of the Slide engineering environment were influential to my personal growth and how I can carry those forward to build as solid an engineering organization at Apture.
The two pillars of engineering at Slide, at least in my naive world-view, were Dave and Libor. I joined Dave's team when I joined Slide, and I left Libor's team when I left Slide. Dave ran the client team, and did exceptionally well at filling a void that existed at Slide bridging engineering prowess with product management. Libor often furrowed his brow and built some of the large distributed systems that gave Slide an edge when dealing with incredible growth. In my first couple years I did my best to emulate Dave, engineers would always vie for Dave's time, asking questions and working through problems until they could return to their desk with the confidence that they understood the forces involved and solve the task at hand. Now that I'm at Apture, I'm trying to emulate Libor.
(Note: I do not intend to idolize either of them, but cite important characteristics)
To understand the Libor role, the phrase "the buck stops here" is useful. A Libor is the end of the line for engineering questions, unlike some organizations the "question-chain-of-command" is not the same as the org-chart. If a problem or question progressed up the stack to a Libor, and between an engineer and a Libor the pair cannot solve the problem, you're screwed.
What does it take to be a Libor you may be thinking:
Pyrage: Static isn't just something on the radio
February 26, 2010 - 5:45am | by R. Tyler CroyDealing with statics in Python is something that has bitten me enough times that I have become quite pedantic about them when I see them. I'm sure you're thinking "But Dr. Tyler, Python is a dynamic language!", it is indeed, but that does not mean there aren't static variables.
The funny thing about static variables in Python, in my opinion, once you understand a bit about scoping and what you're dealing with, it makes far more sense. Let's take this static class variable for example:
>>> class Foo(object): ... my_list = [] ... >>> f = Foo() >>> b = Foo()
You're trying to be clever, defining your class variables with their default variables outside of your __init__ function, understandable, unless you ever intend on mutating that variable.
>>> f.my_list.append('O HAI') >>> print b.my_list ['O HAI'] >>>
Still feeling clever? If that's what you wanted, I bet you do, but if you wanted each class to have its own internal list you've inadvertantly introduced a bug where any and every time something mutates my_list, it will change for every single instance of Foo. The reason that this occurs is because my_list is tied to the class object Foo and not the instance of the Foo object (f or b). In effect f.__class__.my_list and b.__class__.my_list are the same object, in fact, the __class__ objects of both those instances is the same as well.
>>> id(f.__class__) 7680112 >>> id(b.__class__) 7680112
When using default/optional parameters for methods you can also run afoul of statics in Python, for example:
>>> def somefunc(data=[]): ... data.append(1) ... print ('data', data) ... >>> somefunc() ('data', [1]) >>> somefunc() ('data', [1, 1]) >>> somefunc() ('data', [1, 1, 1]) >>>
This comes down to a scoping issue as well, functions and methods in Python are first-class objects. In this case, you're adding the variable data to the somefunc.func_defaults tuple, which is being mutated when the function is being called. Bad programmer!
It all seems simple enough, but I still consistently see these mistakes in plenty of different Python projects (both pony-affiliated, and not). When these bugs strike they're difficult to spot, frustrating to deal with ("who the hell is changing my variable!") and most importantly, easily prevented with a little understanding of how Python scoping works.
PYRAGE!