I have about three books that I am reading on and off but have been unable to focus on any of them for any length of time. Tom The Architect mentioned a book to me a few months ago called Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications by Cal Henderson, engineering manager for the Flickr photo service, a service that I have used extensively since being turned on to it by, you guessed it, Tom The Architect.
This was the first book in a long time that I couldn’t put down, mainly because everything in the book is geared towards teaching you about how to create really, really, big web sites and the issues involved in scaling them. It was also quite intriguing because the book covers tools you use all of the time, like PHP and MySQL that are hard to find really good books about how they scale.
Cal covers a lot of material in this book, from layering your web application architecture, to creating an environment for developers to work in, which includes source control, issue tracking, coding standards and the like. This section was quite encouraging to me, as we have implemented almost everything that Cal mentions in the book (sometimes its nice to get some external validation). Cal then goes on to talk about internationalization and localization, data integrity and security, using email as an alternate entrance into your application, and how to build remote services.
All of this was great, but the next few chapters I found really valuable. Cal talks about identifying bottlenecks in your web application, scaling applications such as MySQL (where he covers quite a few replication strategies) and scaling storage. He also covers measurements, statistics and monitoring. Finally, Cal talks about adding API’s into your application to support mobile applications, web services, etc.
Cal references quite a few tools that are freely available in these discussions – tools that I didn’t even know were out there, that you can use to simplify your monitoring environment. I was most intrigued with the Spread Toolkit, a self described “a unified message bus for distributed applications” that allows you to unify logging across your applications. Anyone who has tried to debug an issue on a site that has more than one box would appreciate knowing about this tool.
This is the first book that I’ve read in a long time, technology wise, that hit the sweet spot between talking about real issues that I have been facing and possible solutions. I highly recommend grabbing this book and in the very least just keeping it on your book shelf for future reference. This is one thats going to be a constant companion for me in the coming months.