Franchu's blog

Bad things are going to happen. MTBF vs MTTR

Last month I attended the first edition of a great conference in Rotterdam called Joy Of Coding. One of the highlights for me was the closing keynote titled “Practicing Joy” delivered by Chad Fowler where he gave a very candid talk on how he came to enjoy software and life. If you don’t know who he is, go and get his book The Passionate Programmer: Creating a Remarkable Career in Software Development (Pragmatic Life). I read it when I had to make a big decision in my life a few years ago and it was really helpful to put things in perspective.

Among the many things that Chad mentioned in his talk, there was a short sentence that I scribbled down on a piece of paper and that I remembered today after reading the interesting post ”When it Comes to Chaos, Gorillas Before Monkeys” discussing some members of the Netflix Simian Army.

Bad things are going to happen. Do you plan for MTBF or MTTR?

MTBF (mean time between failures) and MTTR (mean time to recovery) are two metrics that can be used to think about how we design our systems, how we think about failures and how we plan for continuity of service. MTTR is more important than MTBF (for most types of F) summarizes well my point of view.

Review: The Signal and the Noise: Why So Many Predictions Fail but some Don't

I’ve finished reading the book The Signal and the Noise: Why So Many Predictions Fail but Some Don’t by Nate Silver.

For those of you who have not heard about him, he is the mind behind the FiveThirtyEight blog, where he correctly predicted the winner of all 50 states and the District of Columbia for the US 2012 Presidential Election. He became a celebrity by basing his predictions on multiple poll data sources and the use of bayesian statistics. His methodological process performed much better than the predictions by the political pundits.

The book itself is a light read. It does not delve into deep statistical topics but provides a broad vision over a range of fields where statistics and big data have (or are poised to have) a big impact.

If you want to read a pair of reviews and a commentary from a reputed statistician, you can take a look at Andrew Gelman’s take on the matter: Two reviews of Nate Silver’s new book, from Kaiser Fung and Cathy O’Neil.

My review is based on some quotes from the book that I consider to be good food for thought.

One of the most important feedbacks in the market is between what he calls fear and greed. Some investors have little appetite for risk and some have plenty, but their preferences balance out: if the price of a stock goes down because a company’s financial position deteriorates, the fearful investor sells his shares to a greedy one who is hoping to bottom-feed. Greed and fear are volatile quantities, however, and the balance can get out of whack. When there is an excess of greed in the system, there is a bubble. When there is an excess of fear, there is a panic.

I found this description of the origins of bubbles and panics extremely easy to understand.

Political news, and especially the important news that really affects the campaign, proceeds at an irregular pace. But news coverage is produced every day. Most of it is filler, packaged in the form of stories that are designed to obscure its unimportance. Not only does political coverage often lose the signal - it frequently accentuates the noise.

This is something we tend to forget. We live in a world were information travels faster than ever and where the need for news agencies and media to provide breaking news all the time is creating a race to the bottom in the quality they offer. Additionally, the barrier of entry for creation of media that can have a global reach is extremely low. Moreover, nowadays, anyone can create content and can share their opinion disguised as facts. These unimportant news and the bogus online content are indeed accentuating the noise, making the act of finding useful information more difficult than it should.

Ultimately, the right attitude is that you should make the best forecast possible today - regardless of what you said last week, last month, or last year. Making a new forecast does not mean that the old forecast just disappears. (Ideally, you should keep a record of it and let people evaluate how well you did over the whole course of predicting an event.) But if you have reason to think that yesterday’s forecast was wrong, there is no glory in sticking to it. “When the facts change, I change my mind,” the economist John Maynard Keynes famously said. “What do you do, sir?”

And this is the beauty of Bayesian statistics, new facts help us refine our predictions.

Goodhart’s law, after the London School of Economics professor who proposed it, holds that once policy makers begin to target a particular variable, it may begin to lose its value as an economic indicator. For instance, if the government artificially takes steps to inflate housing prices, they might well increase, but they will no longer be good measures of overall economic health. At its logical extreme, this is a bit like the observer effect (often mistaken for a related concept, the Heisenberg uncertainty principle): once we begin to measure something, its behavior starts to change.

Not much to add. I particularly like when ideas from one field find a similar formulation in another one.

… as a default, just as we perceive more signal than there really is when we make predictions, we also tend to attribute more skill than is warranted to successful predictions when we assess them later. Part of the solution is to apply more rigor in how we evaluate predictions. The question of how skillful a forecast is can often be addressed through empirical methods; the long run is achieved more quickly in some fields than in others. But another part of the solution - and sometimes the only solution when the data is very noisy - is to focus more on process than on results. If the sample of predictions is too noisy to determine whether a forecaster is much good, we can instead ask whether he is applying the attitudes and aptitudes that we know are correlated with forecasting success over the long run. (In a sense, we’ll be predicting how good his predictions will be.)

For lack of better ways to assess performance, at least we should make sure that we are doing things right.

All in all, I found it a very enjoyable book accessible to the layman to get a different perspective of today’s world. If instead, you are looking for deep statistical analysis this is not the book you are looking for.

General principles for good REST API design

I’ve stumbled across a nice StackOverflow question. While the question is good, one of the answers is excellent! I wanted to share it here to have it at hand for reference purposes. If you find it useful, please take a moment and upvote the original by Bob Aman.

General principles for good URI design:

  • Don’t use query parameters to alter state
  • Don’t use mixed-case paths if you can help it; lowercase is best
  • Don’t use implementation-specific extensions in your URIs (.php, .py, .pl, etc.)
  • Don’t fall into RPC with your URIs
  • Do limit your URI space as much as possible
  • Do keep path segments short
  • Do prefer either /resource or /resource/; create 301 redirects from the one you don’t use
  • Do use query parameters for sub-selection of a resource; i.e. pagination, search queries
  • Do move stuff out of the URI that should be in an HTTP header or a body (Note: I did not say “RESTful URI design”; URIs are essentially opaque in REST.)

General principles for HTTP method choice:

  • Don’t ever use GET to alter state; this is a great way to have the Googlebot ruin your day
  • Don’t use PUT unless you are updating an entire resource
  • Don’t use PUT unless you can also legitimately do a GET on the same URI
  • Don’t use POST to retrieve information that is long-lived or that might be reasonable to cache
  • Don’t perform an operation that is not idempotent with PUT
  • Do use GET for as much as possible
  • Do use POST in preference to PUT when in doubt
  • Do use POST whenever you have to do something that feels RPC-like
  • Do use PUT for classes of resources that are larger or hierarchical
  • Do use DELETE in preference to POST to remove resources
  • Do use GET for things like calculations, unless your input is large, in which case use POST

General principles of web service design with HTTP:

  • Don’t put metadata in the body of a response that should be in a header

  • Don’t put metadata in a separate resource unless including it would create significant overhead

  • Do use the appropriate status code

    • 201 Created after creating a resource; resource must exist at the time the response is sent
    • 202 Accepted after performing an operation successfully or creating a resource asynchronously
    • 400 Bad Request when someone does an operation on data that’s clearly bogus; for your application this could be a validation error; generally reserve 500 for uncaught exceptions
    • 401 Unauthorized when someone accesses your API either without supplying a necessary Authorization header or when the credentials within the Authorization are invalid; don’t use this response code if you aren’t expecting credentials via an Authorization header.
    • 403 Forbidden when someone accesses your API in a way that might be malicious or if they aren’t authorized
    • 405 Method Not Allowed when someone uses POST when they should have used PUT, etc
    • 413 Request Entity Too Large when someone attempts to send you an unacceptably large file
    • 418 I'm a teapot when attempting to brew coffee with a teapot
  • Do use caching headers whenever you can

    • ETag headers are good when you can easily reduce a resource to a hash value
    • Last-Modified should indicate to you that keeping around a timestamp of when resources are updated is a good idea
    • Cache-Control and Expires should be given sensible values
  • Do everything you can to honor caching headers in a request (If-None-Modified, If-Modified-Since)

  • Do use redirects when they make sense, but these should be rare for a web service

Archive