Your Ad Here

Social Networking and the handshake problem

reposted from blog.philcrissman.com. Since I’m moving back here, I may repost a few of my last few months blog posts, just so they are all in one place.

There was a lot of discussion awhile back about Facebook and its limit of 5,000 friends. The discussion, in case you (mercifully) missed it, was largely driven by social networking power users like Scoble.

I didn’t think too much about it at the time. Between now and then, I’ve thought a few times about social networks, and how they are implemented. Not the front end, but the back matter; the model. The database; users and connections.

A social network is fundamentally a graph. Most social networks are analogous to non-directed graphs – that is, there is no distinction which “direction” a connection goes. If you are friends with someone in Facebook or MySpace, they are also friends with you. Pownce and Twitter have taken a directed-graph approach, in which friendship (following) is not automatically mutual. I could follow you, but you might not follow me. And so on.

Like any data structure, there are multiple ways of implementing a graph. Fundamentally it consists of vertices (the users, or nodes) and edges (the connections between the vertices). If we need to represent a directed graph (like Twitter) we have potentially two edges between any two vertices. In a non-directed graph (Facebook, LinkedIn, etc) only one edge will suffice. As far as implementation goes, a common solution would be to have a table for vertices(users) and a table for edges(connections or friendships).

The Facebook limit of 5,000 friends poses an interesting question. What exactly is the overhead of 5,000 friends? If only a few people have 5,000 friends, it’s not too bad; 5,000 edges. But if you’re implementing this, you need to consider a worst-case scenario. What if everyone decides to have 5000 friends?

As a simple example, suppose 5,000 people in Facebook are all part of the same organization, and all decide to befriend each other. Let’s say that everyone in the Oracle network decides they need to connect to every other Oracle person on Facebook. For simplicity, let’s say this network has 5,000 members, so we don’t need to worry about breaking Facebook’s limit.

Assuming that the users and connection are stored in two tables as mentioned earlier, our “users” table would have 5,000 records.

How many records would the “connections” table have?

The answer is simpler than it might seen; it’s the same as the handshake problem. The handshake problem asks, if there are N people in the room, and everyone shakes hands with everyone else one time, how many handshakes have there been? The answer is the triangular number for N – 1.

Triangular numbers are like a factorial, but using addition instead of multiplication. So where N! would be 1 * 2 * 3 * 4 * … * N, the triangular number for N would be 1 + 2 + 3 + 4 + … + N. We’ll call it Tn. By this reasoning, T1 is 0, T2 is 1, T3 is 3, T4 is 6, and so on. For small graphs it’s very easy to confirm this by just drawing a few dots, drawing a line between every dot and counting the lines.

Is there a proof that for a graph of N vertices we’d need Tn – 1 edges to connect all of them? I’m a little rusty with formal proofs, but this one is fairly simple. If you have N nodes, you could make sure they are all connected by starting with one, and then drawing a line from it to each other node; next you would move to the next node, and draw a line to every other node to which was not already connected, and so on, until you run out of nodes.

If you were to actually do this, your first node, N1 would have N – 1 lines, as there would a line between it and every other node, excepting itself. When you move to N2, it will only need N – 2 lines, as it is already connected to N1… each consecutive node Nk would need one less edge to connect it to the remaining nodes, and you would wind up with edges numbering (N – 1) + (N – 2) + (N – 3) + … + 1, or Tn – 1. Okay, as a formal proof it’s sloppy, but hopefully you can follow it.

Fortunately, there’s a simple formula to calculate Tn : n(n -1)/2.

So, if 5,000 people are all connected to each other, that requires 12,497,500 connections.

Okay… one step back here. Facebook has (last statistic I could find) about 67 million users. That means if each and every one of them were connected to their maximum amount of friends, it would require a database table of 167,466,500,000 records to store all the connections. Yes, that’s nearly 167.5 billion.

Not only that, but if the maximum friend count was increased by only one more friend each, to 5001, and every Facebook user took advantage of it, that would actually add approximately 133 million connections to the table. Just by allowing ONE more friend per person.

Now, I’m not a DBA, but those numbers are huge for database tables. I’m pretty certain that most commercial databases would need to split a table like this up into multiple tables, for performance reasons if nothing else. I’m not going to attempt to figure out actual performance times to search a table of that size, but it’s not insignificant, especially if millions of others are searching the same table. In fact, looking at how high those number get even with the current maximum of 5,000 friends, I think it’s safe to say that Facebook’s engineers are counting on the hope that 98% of their users will never create this many connections.

Now one argument could be that, even if Facebook doubled their maximum friends allowed, still only a VERY small percentage of their users would accumulate this many friends, thus keeping the database records to a much more manageable number. That said, Scoble once reported that they started to see scaling issues over 5,000 friends, so it’s entirely possible that they even this is not practical—I’m going to assume that the engineers know what they’re talking about.

Another problem is, even if we are certain that 98%(ish) of the userbase won’t connect to nearly this many people, there is still the latent possibility that they could. There’s also the possibility that a worm or script could do this for them, which is not necessarily as far-fetched as it sounds.

Without knowing the details of Facebook’s architecture, I’m still going to have to side in with Facebook’s decision—I’d say allowing more than 5,000 per person friends is probably a bad decision, architecturally. That’s not to say there’s no way to do it effectively, but I can see there being some big challenges. Sorry, Scoble. I think I’m with Facebook on this particular decision.

Note: When I originally posted this recently on http://blog.philcrissman.com (currently static Mephisto-driven blog), I received a comment pointing out that we don’t need to keep all the edges in one table, they could be in a flat file instead of a table, etc. I’m inclined to agree, I can think of some alternate ways of doing this. This wasn’t (originally) intended to be a literal meditation on how/why Facebook works under the hood. I was more interested in the way the numbers skyrocket if we allow everyone to be connected in a social network type of graph — Facebook was simply an easy example because of the well-publicized 5,000 member limit. Regardless of the exact way the graph is implemented, I still think the potential number of edges poses some challenges — unless we simply assume that most people will just not be connected to that many other people.

1 Response to “Social Networking and the handshake problem”


  1. 1 ahmer

    nice article .. thanks

Comments are currently closed.