Reputation Documents and WikiPedia

Bayle Shanks wrote about a proposed distributed reputation document standard, which I found very interesting. But I think the way 'reputation type' is handled needs more thought. Specifically, I think it would be useful to be able to link not just to a person and a type of reputation, but to a specific action, role, judgement or opinion identified with that person.

Why would this be useful? Consider WikiPedia.

Why should there be just one reputation system for WikiPedia. Will everyone agree that contribution X is better than contribution Y? It would be more elegant if the raw reputation data were available (as Bayle's document format, or otherwise) and then people could apply their own criteria to choose how to take that raw data and apply it to produce a view of WikiPedia edits customised to their own tastes.

The problem I forsee with this is access to the data and computation. Say I wanted to view the WikiPedia entry on blues music. What would my computer (or the intermediate server creating the view for me) need to get?

1. The text itself
2. Who made which edit / contributed what text
3. The category of the text (entertainment:music:blues in this case)
4. Any reputation certificates of those contributers in this area
(and can reputation certificates cover negative reputations?)
5. The additional information needed to judge the worth of those granting those certificates

The problem is 5. Effectively I would have to download (or have pre-downloaded, via subscription maybe usenet or bittorrent like) the complete web of trust - all the certificates for all of the contributors of the WikiPedia. Maybe there is some pre-generated abstract I could compute in downtime, that would allow me to judge the trustworthiness of certificates from a new entry I am visiting? Thoughts?

Douglas Reay
douglasr (at)

"Reputation budget" standard document format proposal

Here is a first-draft of a proposal for a distributed reputation document standard. Not a specific trust metric, but just a proposal that everyone chooses a standard document format for posting lists of:

* who they trust (/repute/certify etc)
* in what WAY they trust them (I certify this person as a friend/i certify this person as a trustworthy person to do business with/i certify this person as someone who knows a lot about ancient history, etc)
* and how MUCH they trust them.

In addition, each document cites a URI that specifies the "reputation type", which means that context of social norms/internet standards in which the document is meant to be interpreted. Each different "reputation type" is in essence its own internet standard.

I am not aware of any other generic distributed reputation list document standard proposals, but I don't know much about this stuff so please tell me if you know of any. Two proposed extensions to FOAF that cover atomic certifications (not lists of certifications) are 1 and 2.

The idea is that once people put up documents like this, others will be able to spider them, just as "social search engines" can spider FOAF files now.

more details on the "reputation budget" proposal at CommunityWiki

What do you think?

TrustFlow frequently asked questions

TrustFlow for LiveJournal

Frequently asked questions

What do the results mean?
TrustFlow is making a guess at who is "near" your friends list; who might be on it, but isn't. It does this by looking at your friends list, and the friends list of your friends, and so on.
Is this based on who reads my journal, or interests, or what?
No. TrustFlow looks only at who is on whose friends list to make the determination; no other information is taken into account. In particular, it doesn't know anything about whose journals you are actually reading, or who is reading your journal, except what friends lists tell you.
Are the people listed in any order?
Yes; the numbers are a measure of "distance", so the first person listed is "closest".
How exactly does it determine who to list? What do the numbers mean?
A description of the algorithm appeared earlier in this journal.
I get an error!
Please read the error carefully before telling me about it, and please quote the error exactly - otherwise how can I do anything about it?
I've changed my friends list, but the results haven't changed
When it fetches friends data, it keeps that data for around 24 hours; if you wait about that long after a change you'll see that change reflected in the results.
It doesn't work at all - it lists my worst enemy first!
That means it's working. That person is someone quite close to your circle of friends, who you would list except that you've deliberately decided not to. It can't tell that you don't like them; all it can tell is that they are close enough to your circle of friends that you would list them if you didn't feel that way.

More will be added here as the questions are asked...


The TrustFlow algorithm

TrustFlow is a "trust metric" algorithm, which uses human-generated information about trustworthiness in a human-sized community of a few hundred people to generate guesses about trustworthiness in an Internet-sized community of millions of people; this is useful in applications like ranking search hits and preventing spam and vandalism. TrustFlow is unique among "attack resistant" trust metrics in that it can load information about who trusts who incrementally as needed; this makes it well suited for distributed use.

We apply TrustFlow to LiveJournal by treating the decision to place someone on your friends list as an assertion of trust in them. This isn't 100% true and leads to some strange artifacts, but it works well enough to produce interesting results.

One user is special: the trust root is the source from which all trust emanates.

Each user starts with a virtual bucket into which "trust juice" is poured; all buckets have one litre capacity, and they all start out empty. Trust juice pours from heaven into the bucket of the trust root. After one litre has been poured, this bucket is full to overflowing; when it starts to overflow, gutters around the edge of the bucket carry the trust juice in equal quantity to the buckets of all their friends.

Suppose they have ten friends. After eleven litres have been poured, the buckets of the trust root and of all their friends will be full, and start to overflow. At this point things get interesting. Now the juice pours from heaven into the trust root's bucket, overflows into the bucket of one of their friends, and then overflows into the buckets of the trust root's friends of friends. People who are friended by many friends of friends will get more juice per second, but the friends links of people who list few friends will carry more juice than the links of those with many.

Eventually one of the buckets of the friends of friends will fill up; they then are the first people on the list that TrustFlow displays. Their score is the number of litres of trust juice that got poured into the trust root before their bucket filled up, and any more trust juice they receive overflows to be shared among their friends. And so on, until 200 more people have filled their buckets.

One final detail. Sometimes the juice hits "dead ends". If Bob and Carol are the only people on each other's friends lists, then once their buckets are both full they will have no-one to pass trust juice on to. In this instance the juice just "backs up" - no more juice flows to either of them. Each person with a full bucket shares the juice they receive evenly between all of their friends who have got someone with a non-full bucket to pass it on to, either directly or indirectly.

That's the sketch of the workings. If you want to actually implement this, or indeed any trust metric, it helps to understand a little about graph theory. TrustFlow analyzes a digraph of trust. Each arc is an assertion of trust; for example, each vertex might be a LiveJournal user, and there is an arc from A to B when A has B on their friends list. You'll also need to know what an eigenvector is; calculating the inbound flow on each node requires finding an eigenvector for a flow matrix. I use an iterative algorithm and I usually only need one iteration to update the flow; you could probably do things even more efficiently if you only recalculated the parts that needed to be recalculated.
Black hat

A trust metric enabled Wikipedia

I love the idea of a trust metric enabled Wikipedia. There's plenty of questions to chew on.

How would a trust metric enabled Wikipedia work? Obviously every user will have to list trust information about other users, but what do you do with that information?

Here's the easy way: Jimbo is the root of trust from which the trust metric runs. If your trust is above a certain threshold according to that metric, you can edit, otherwise you can't. If you don't like it, set up your own wiki.

Can we do better than that? Supposing we move away from the model in which a single version of the page is the "current" version, and old versions are there only as a historical record? That can be anything from a small step to a giant leap.

As a small step, we could allow untrusted users to edit the page, but normal Wikipedia visitors only see the version that has most recently been edited by a trusted user. Each trusted user is expected to review any untrusted edits they may implicitly be including when they edit a page, which is what users do now in any case for the most part; if they don't like the edits, they could revert back to the last trusted version, and normal users will never see the edits they rejected. This resembles the article validation proposals currently going forward on Wikipedia, but backed by a real trust metric.

Supposing we allow the article to fork? Wikipedia already "sort-of" allows forks, in that you can choose any version of an article as the basis for your next edit, but your new edit will become the current version and no metadata record of what edit you started from is kept, so you can't do useful things with the forks. Supposing we explicitly record the tree of versions? This would allow the sophisticated tools provided by any modern version control system to synthesize a new version of the article from whichever subset of the edits the user thought was appropriate, making it much easier for the trusted users to winnow the wheat from the chaff when choosing an edit to make current.

Now we approach the "giant leap". When winnowing the wheat from the chaff, I don't have to consider only the binary trusted/untrusted decision of the trust metric; I can use the gradations of trust it provides as a guide to how much good faith to assume in a given edit. Coming closer to the leap, I don't have to use Jimbo as the root of trust for these decisions - the point of a trust metric like Trustflow is that I can afford to do the trust calculations for myself.

The "giant leap" comes when we take it to the next level. Under these circumstances, is there any need for all users to agree on a single "current version"? The "default view" of Wikipedia might have Jimbo as the root of trust, but I might like to choose someone else - myself, for example. What effect would this have on NPOV disputes? Would each side of a contentious debate end up with their own persistent fork, or would a consensus version emerge? Would the good done by decentralization outweigh the bad done by automatic self-reinforcing reading bias?

Given this outline, when I go to view an article on a topic, how do I choose which version to view? What tradeoffs between newness and trust will I make? Will there be problems of information moved from one article to another "slipping through the cracks" if I trust the old version of one but the new version of another?

Should we be trying to use domain-specific information? One user may be trusted when editing on cryptography but not on animal welfare, say. Can we synthesize domains from the information available about what links to what and who is editing it? Can we capture in our trust information that people who know about crypto trust X but not people who know about animal welfare, and use that to fine-tune our trust decisions depending on the subject area of the article we're assessing it in reference to? Can we do so without anyone having to explicity identify domains?

Finally (for now), can we do all this in a distributed fashion, so that we are all hosting our own intricately interlinked and interrelated versions of Wikipedia, each drawing edits from each other but reflecting our own unique spin on the world?
Black hat

(no subject)

Raph Levien wrote an interesting blog entry in December about trust metrics and Wikipedia.

Time for a trust metric enabled Wikipedia?

I see that Wikipedia is having some well-publicized troubles with vandalism and the like. This will be a somewhat bittersweet response.

The success of wikis has taken a lot of people by surprise, but I think I get it now. The essence of wiki nature is to lower the barrier to making improvements to content. The vast majority of access-controlled systems out there err strongly on the side of making it too hard. The idea of a wiki is to err on the side of making it too easy, and to lessen the pain (somewhat) of undoing the damage when that turns out to be a mistake. In cases where that doesn't work out, I think the solution is to make the decision process of whether to grant write access a bit more precise, so you can still err on the side of trusting too much, but you don't have to err quite as often or as badly.

In that regard, the trust metrics designed and implemented for Advogato are a near-perfect match for the needs of a Wikipedia-like project, but for the most part, nobody is paying much attention to my ideas. ( Read more... )

I think he's entirely right. Raph discusses some responses he got here, but I'd be interested to know what people here think!

(no subject)

I joined Outfoxed (user Id recompiler and vladg).
One of my biggest pet peeves is it that it IS centralized even though it claims not to be. By default when most people join they trust Stan (very very very bad) and Outfoxed (bad but not TOO terrible). Since default length of trust is 3 nodes and Stan and Outfoxed is 2 hops away from anyone (even unwanted and hostile parties). The guy is in a hurry to write something cool and gather data for his masters project and he fucked up. There may have been a lot of growth on the system but the trust matrix is broken from the very start (vs how PGP key signing works). One thing they can try to do to fix it is in a month or so break the trust relationship between Stan and everyone that's not actually his friend and hope people started trusting other users on the system. One fun thing I found is rating the trust of processes on your computer. Out of boredom I added a bunch of random processes to trusted. Then I decided to artificially try to fix it for myself by creating a 2nd account for myself on another system creating high level of trust to my primary account and to my trusted friends and then setting both accounts to ignore/not trust Stan.
I urge everyone to write to the University Stan is in and voice your concerns. There is no shortcut to good data no matter how many front page stories on slashdot you get.
Read more...Collapse )
  • Current Music
    Bianca Mix - 23

Hosting and other useful stuff

Around a year after the last version went live, I'm nearly ready to make a new version of TrustFlow available. This version is far faster (it uses iterative approximation eigenvector finding instead of the previous "token-passing" algorithm) and is written in Python; it also has a somewhat slicker user interface (a progress bar!). I'll make the source code available at the same time as I make it live.

I'd like to ask a couple of things of the trustmetrics community.

Hosting: I got an offer of hosting a long time ago, but I don't know if it's still extant. Can anyone offer to host this thing? You'll need a decent sized machine, because it was very popular last time and it might be again; bandwidth charges shouldn't be too bad since it's nearly all text. I'll make sure your name is in lights next to mine on the relevant pages :-)

Animated GIF: Can anyone out there design animated GIFs? I want a nice one to put on the "In Progress" page to indicate "yes, progress is being made on calculating your TrustFlow results". It should be very small, so it doesn't cost much in bandwidth to serve it. I have in mind something like a representation of a network of people, with trust pulsing along the arcs and lighting up the nodes, but feel free to design whatever you think is best.


(no subject)

If you scraped foaf pages, once per second, you could gather 86,400 people for day. Cache them locally. You could refresh the bulk of the active users every week. You could then do all kinds of manipulations locally, and never really be that far out of whack with LJ.