Thoughts on Twitter architecture and pricing
Om Malik wrote an interesting post about twitter pricing yesterday, but I think he’s a little off. I don’t blame him, considering his background is not computer science. And besides, it started a really interesting conversation. Before we start talking about Twitter pricing plans, we need to come to an agreement about what technically is hurting Twitter. Ideally, scaling issues should be orthogonal to your business plan; if you are successful, lots of people use your product, and that’s a good problem to have. Generally, you don’t want to tax your best users.
So on to the technology. Here’s the clue that we’ll start with:
Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency’s sake, Twitter was built with technologies and practices that are more appropriate to a content management system.
From Twitter’s post on architecture and the problems they are facing
When I read “content management system”, I’m thinking “blogging platform”. My guess is that Twitter is built to be a massively multi-user blogging and blog reading system - every user gets a blog to publish posts with and a blog reader to aggregate the posts of their friends. Considering Evan Williams was the founder of Blogger, I think it’s pretty reasonable.
So if you think of it that way, then the obvious way to architect the system is publishing via RSS and aggregating via RSS. When you write a new tweet, your message gets stored in the database. (Yes, shoving all of that data into a database is a really difficult engineering problem in itself. Assuredly they will partition across multiple databases if they don’t already.) The massive pain comes in when pulling in what your friends’ tweets are. Let’s talk through how it works. Your twitter homepage is acting like an RSS reader, so first it will lookup all of the feeds it needs to check - all of the people you follow. Then, for every person you follow, an RSS feed will be read or generated. The resulting set of RSS feeds will be merged back together and sorted chronologically. The result is your Twitter homepage.
Notice here that this is what is called a “pull” or “poll” model - you are checking for new posts whether there are new posts or not. This can generate a ton of unnecessary load on servers and databases, not to mention network traffic costs. With the advent of Twitter applications, these applications are constantly polling Twitter to see if there is anything new to publish. Ping, ping, ping. All to see if there is something new afoot.
Which brings us around to pricing. It is not, as Om suggested, Scoble’s fault for having 25,000 people following him. The cost is not sending one of his messages 25,000 times. No, actually it’s Scoble’s fault for following 21,000 people and constantly checking for new tweets from those people. It’s also the fault of power users like him using applications that aggressively use the Twitter API to check for new tweets - most likely the same people who use those applications are following large numbers of people.
As with all scaling problems, the first idea is “cache more!”. And sure, you can cache the heavy Twitter producers. But Scoble isn’t following just the big twitter users - he’s following everyone he can, because that’s how he believes he can get an edge on news and trends. Can the long tail be cached? Doubtful - there are too many users who fall into that category. Can you charge those who follow more than, say, 1000 people? Maybe $10 a month for every thousand people you follow, with the first 1,000 free? That could work, but it’s risky. Would Scoble, in the face of paying $210 a month, permanently switch to Pownce? Or Friendfeed if they built a twitter clone? How many would follow?
The solution, of course, is to do exactly what Twitter says they are doing - switch to a different model and scale horizontally (”throw more machines at it”). I’m interested to see how it turns out for them.