PHP Post-Facebook
Facebook is famously built in PHP, and I think it is really interesting to watch what they are doing technically with that massive codebase. This week Facebook announced Hack, a new language that specifically targets their previously released HipHop (HHVM) project released in 2008. Facebook obviously has massive scale challenges, and they have little ability to cache content, so they are having to redefine how PHP works. So what’s interesting about this? Compare this all to Twitter. Twitter was originally built on the very dynamic and popular Rails framework, using Ruby. Twitter had massive scaling problems with near daily “fail whales” displayed during outages. These are now a thing of the past because Twitter brought in a ton of engineering talent, and they effectively engineered Ruby on Rails out of their environment. They replaced the entire architecture with code written in much more performant environments. In short, they grew out of their house and moved to a new house. Imagine what would have happened to the Rails ecosystem if they had instead decided that they would reinvent the ecosystem to scale to what they needed?
Facebook isn’t the first huge PHP-based website of course. Wikipedia is one of the three-largest sites on the Internet and is built on PHP. All WordPress blogs are built on PHP. However, both of these examples have really good caching scenarios. Wikipedia uses Varnish to cache all requests they can and as a result they reduce their dependency on PHP greatly. WordPress uses caching in the same way. Both sites eschew calls to PHP for the vast majority of their requests by doing this. This is probably the most widely accepted pattern for scaling PHP, which doesn’t have a good track record for scaling. Just bypass PHP as much as possible. However, for Facebook this isn’t an option. They want to personalize every single request so caching just doesn’t work for them.
I find it fun to watch Facebook doing this. Internally Facebook has a focus on speed and believes that using PHP and PHP-like tools is part of achieving that. They can’t cache. So, rather than move that massive codebase they are changing what it runs in. Instead of moving to a new house, they are remodeling!
First with the introduction of HHVM, and now with Hack, they are redefining the characteristics of the platform their code runs on to achieve the performance they want. I find this interesting because it is a path so rarely taken. Certainly no small startup could (should?) do this. Their simply isn’t the time or money to do it, and it takes your focus off your main goal. You could look at Google and Go as something similar, but I don’t think their motives for making Go are anything like what Facebook is doing with HHVM and Hack.
I like to say that PHP is “the people’s language”. It is the disdain of almost any developer you meet. It’s crufty, gross and houses some terrible code. Some of this is the languages fault, but a bunch of it is also that many people first learned to program in PHP. PHP is also the language that nearly every blog and wiki you have ever visited uses. I would go so far as to suggest that there are more page views on the Internet of PHP than any other language in existence.
Wikimedia Foundation, the non-profit that runs Wikipedia and hosts the Wikimedia engine, is running development versions of Wikimedia on the HHVM engine. Part of Wikipedia’s scaling plan is now coming from the byproduct of Facebook redefining how PHP works. That is really cool. By choosing to change their ecosystem, instead of moving to a new one, Facebook is building a path that millions of blogs and wikis may be able to follow. That is pretty interesting, and why this is a path worth watching.