Archive for twitter

Thoughts on Twitter architecture and pricing

Om Malik wrote an interesting post about twitter pricing yesterday, but I think he’s a little off. I don’t blame him, considering his background is not computer science. And besides, it started a really interesting conversation. Before we start talking about Twitter pricing plans, we need to come to an agreement about what technically is hurting Twitter. Ideally, scaling issues should be orthogonal to your business plan; if you are successful, lots of people use your product, and that’s a good problem to have. Generally, you don’t want to tax your best users.

So on to the technology. Here’s the clue that we’ll start with:

Twitter is, fundamentally, a messaging system. Twitter was not architected as a messaging system, however. For expediency’s sake, Twitter was built with technologies and practices that are more appropriate to a content management system.

From Twitter’s post on architecture and the problems they are facing 

When I read “content management system”, I’m thinking “blogging platform”. My guess is that Twitter is built to be a massively multi-user blogging and blog reading system - every user gets a blog to publish posts with and a blog reader to aggregate the posts of their friends. Considering Evan Williams was the founder of Blogger, I think it’s pretty reasonable.

So if you think of it that way, then the obvious way to architect the system is publishing via RSS and aggregating via RSS. When you write a new tweet, your message gets stored in the database. (Yes, shoving all of that data into a database is a really difficult engineering problem in itself. Assuredly they will partition across multiple databases if they don’t already.) The massive pain comes in when pulling in what your friends’ tweets are. Let’s talk through how it works. Your twitter homepage is acting like an RSS reader, so first it will lookup all of the feeds it needs to check - all of the people you follow. Then, for every person you follow, an RSS feed will be read or generated. The resulting set of RSS feeds will be merged back together and sorted chronologically. The result is your Twitter homepage.

Notice here that this is what is called a “pull” or “poll” model - you are checking for new posts whether there are new posts or not. This can generate a ton of unnecessary load on servers and databases, not to mention network traffic costs. With the advent of Twitter applications, these applications are constantly polling Twitter to see if there is anything new to publish. Ping, ping, ping. All to see if there is something new afoot.

Which brings us around to pricing. It is not, as Om suggested, Scoble’s fault for having 25,000 people following him. The cost is not sending one of his messages 25,000 times. No, actually it’s Scoble’s fault for following 21,000 people and constantly checking for new tweets from those people. It’s also the fault of power users like him using applications that aggressively use the Twitter API to check for new tweets - most likely the same people who use those applications are following large numbers of people.

As with all scaling problems, the first idea is “cache more!”. And sure, you can cache the heavy Twitter producers. But Scoble isn’t following just the big twitter users - he’s following everyone he can, because that’s how he believes he can get an edge on news and trends. Can the long tail be cached? Doubtful - there are too many users who fall into that category. Can you charge those who follow more than, say, 1000 people? Maybe $10 a month for every thousand people you follow, with the first 1,000 free? That could work, but it’s risky. Would Scoble, in the face of paying $210 a month, permanently switch to Pownce? Or Friendfeed if they built a twitter clone? How many would follow?

The solution, of course, is to do exactly what Twitter says they are doing - switch to a different model and scale horizontally (”throw more machines at it”). I’m interested to see how it turns out for them.

Comments (3)

Connecting online with @people via OpenID

I think the best innovation from twitter was messaging with @username. If I remember right, Twitter didn’t support that at first - it was a grassroots invention by Twitter users that was picked up and officially supported. Facebook did something similar even before Twitter: when you wrote a post on your Facebook blog (haven’t seen much uptake there), you can choose from your friends list which friends are mentioned in the post. Kinda kludgy solution though, since you have to scroll through hundreds of people and click a bunch of checkboxes.

It’s obvious from the evolution of @replies on Twitter that this is something very organic and natural to humans. The internet is not a solitary vacuum; all software is social.

So here’s a thought: let’s bring @replies to the rest of the web. Whether I’m writing a post on my blog or commenting on a Flickr photo or sharing an item on Google Reader, I should be able to use @username. This serves two purposes.

1. Who is being referred to?

One is to give everyone reading your comment to understand who you are talking to. This is a basic tenet of face-to-face group communication - you turn to a specific person in the group, address them by name, and speak. Sometimes, like at a big dinner party, you might not know all the guests, leaving you guessing as to who is whom and what their background is. On the internet, we can do better. By linking to some kind of profile, the comment reader can read up on who is being pulled into the conversation and better understand context.

2. Who is referring to you?

Here’s something the internet can do that can’t happen in real life - being able to read the record of all conversations that made reference to you. Twitter does this with their “Replies” page. Why not off Twitter as well?

How #1 could be implemented

This is a really difficult engineering problem, and I won’t pretend like I’ve got all the answers. So I’ll do my best. There are a number of existing web sites that vend OpenID accounts, including Yahoo, Blogger, and LiveJournal. Here’s the list. All of these services support some kind of “Profile” page, where the user can publish information about themselves. So we have a decentralized way of naming people (OpenID) and we have a way to lookup information about that person (hosted profile). So what’s missing is browser support for interpreting the @reply markup.

What’s that you say? No one is going to use awkward OpenID URLs to name people? You’re right. So, browsers will also need hooks into your Address Book, so that they know which “John” you are referring to. This could have the same auto-complete UI that email clients already support - as soon as you start typing @John, a small drop down appears next to your cursor showing the various people you know who match “John”. You pick the right one, and the markup is entered for you, linking to John’s profile.

How #2 could be implemented

The last bit of this is discovering all the places people are referring to you. This is tricky, and the two ideas I have have weaknesses. One idea uses another open technology called XMPP, the Jabber protocol. Here’s how it could work. When your browser publishes “@John”, it will use XMPP to send a message to John’s OpenID server notifying John of the reference to his name. When John logs into his OpenID-supporting service of choice, he can be shown all of the messages that have been pushed to him.

The other idea is for the OpenID server to support an HTTP POST whose payload would be the URL where the reference was made. The OpenID server would log all traffic to that special URL and pass it on to John once he logs in.

Thoughts?

Anyone have thoughts on this? Obviously to big (some might call it “unlikely”) changes need to happen. First, browsers need to add support for OpenID based @name markup. Second, browsers need to know how to send XMPP messages (or, invoke a hidden URL hosted by the OpenID server, which might be easier.) Lastly, OpenID servers need to process these incoming messages and present them to the user in some helpful way.

Naturally, I imagine there are a host of security concerns to work through, especially with browsers pushing URLs around. Still, I think this would create a very interesting social ecosystem. What do you think?

Comments (1)