March 28th, 2006


The TrustFlow algorithm

TrustFlow is a "trust metric" algorithm, which uses human-generated information about trustworthiness in a human-sized community of a few hundred people to generate guesses about trustworthiness in an Internet-sized community of millions of people; this is useful in applications like ranking search hits and preventing spam and vandalism. TrustFlow is unique among "attack resistant" trust metrics in that it can load information about who trusts who incrementally as needed; this makes it well suited for distributed use.

We apply TrustFlow to LiveJournal by treating the decision to place someone on your friends list as an assertion of trust in them. This isn't 100% true and leads to some strange artifacts, but it works well enough to produce interesting results.

One user is special: the trust root is the source from which all trust emanates.

Each user starts with a virtual bucket into which "trust juice" is poured; all buckets have one litre capacity, and they all start out empty. Trust juice pours from heaven into the bucket of the trust root. After one litre has been poured, this bucket is full to overflowing; when it starts to overflow, gutters around the edge of the bucket carry the trust juice in equal quantity to the buckets of all their friends.

Suppose they have ten friends. After eleven litres have been poured, the buckets of the trust root and of all their friends will be full, and start to overflow. At this point things get interesting. Now the juice pours from heaven into the trust root's bucket, overflows into the bucket of one of their friends, and then overflows into the buckets of the trust root's friends of friends. People who are friended by many friends of friends will get more juice per second, but the friends links of people who list few friends will carry more juice than the links of those with many.

Eventually one of the buckets of the friends of friends will fill up; they then are the first people on the list that TrustFlow displays. Their score is the number of litres of trust juice that got poured into the trust root before their bucket filled up, and any more trust juice they receive overflows to be shared among their friends. And so on, until 200 more people have filled their buckets.

One final detail. Sometimes the juice hits "dead ends". If Bob and Carol are the only people on each other's friends lists, then once their buckets are both full they will have no-one to pass trust juice on to. In this instance the juice just "backs up" - no more juice flows to either of them. Each person with a full bucket shares the juice they receive evenly between all of their friends who have got someone with a non-full bucket to pass it on to, either directly or indirectly.

That's the sketch of the workings. If you want to actually implement this, or indeed any trust metric, it helps to understand a little about graph theory. TrustFlow analyzes a digraph of trust. Each arc is an assertion of trust; for example, each vertex might be a LiveJournal user, and there is an arc from A to B when A has B on their friends list. You'll also need to know what an eigenvector is; calculating the inbound flow on each node requires finding an eigenvector for a flow matrix. I use an iterative algorithm and I usually only need one iteration to update the flow; you could probably do things even more efficiently if you only recalculated the parts that needed to be recalculated.

TrustFlow frequently asked questions

TrustFlow for LiveJournal

Frequently asked questions

What do the results mean?
TrustFlow is making a guess at who is "near" your friends list; who might be on it, but isn't. It does this by looking at your friends list, and the friends list of your friends, and so on.
Is this based on who reads my journal, or interests, or what?
No. TrustFlow looks only at who is on whose friends list to make the determination; no other information is taken into account. In particular, it doesn't know anything about whose journals you are actually reading, or who is reading your journal, except what friends lists tell you.
Are the people listed in any order?
Yes; the numbers are a measure of "distance", so the first person listed is "closest".
How exactly does it determine who to list? What do the numbers mean?
A description of the algorithm appeared earlier in this journal.
I get an error!
Please read the error carefully before telling me about it, and please quote the error exactly - otherwise how can I do anything about it?
I've changed my friends list, but the results haven't changed
When it fetches friends data, it keeps that data for around 24 hours; if you wait about that long after a change you'll see that change reflected in the results.
It doesn't work at all - it lists my worst enemy first!
That means it's working. That person is someone quite close to your circle of friends, who you would list except that you've deliberately decided not to. It can't tell that you don't like them; all it can tell is that they are close enough to your circle of friends that you would list them if you didn't feel that way.

More will be added here as the questions are asked...