Log in

No account? Create an account
Trust metrics' Journal
[Most Recent Entries] [Calendar View] [Friends]

Below are the 20 most recent journal entries recorded in Trust metrics' LiveJournal:

[ << Previous 20 ]
Monday, September 18th, 2006
1:15 pm
Reputation Documents and WikiPedia
Bayle Shanks wrote about a proposed distributed reputation document standard, which I found very interesting. But I think the way 'reputation type' is handled needs more thought. Specifically, I think it would be useful to be able to link not just to a person and a type of reputation, but to a specific action, role, judgement or opinion identified with that person.

Why would this be useful? Consider WikiPedia.

Why should there be just one reputation system for WikiPedia. Will everyone agree that contribution X is better than contribution Y? It would be more elegant if the raw reputation data were available (as Bayle's document format, or otherwise) and then people could apply their own criteria to choose how to take that raw data and apply it to produce a view of WikiPedia edits customised to their own tastes.

The problem I forsee with this is access to the data and computation. Say I wanted to view the WikiPedia entry on blues music. What would my computer (or the intermediate server creating the view for me) need to get?

1. The text itself
2. Who made which edit / contributed what text
3. The category of the text (entertainment:music:blues in this case)
4. Any reputation certificates of those contributers in this area
(and can reputation certificates cover negative reputations?)
5. The additional information needed to judge the worth of those granting those certificates

The problem is 5. Effectively I would have to download (or have pre-downloaded, via subscription maybe usenet or bittorrent like) the complete web of trust - all the certificates for all of the contributors of the WikiPedia. Maybe there is some pre-generated abstract I could compute in downtime, that would allow me to judge the trustworthiness of certificates from a new entry I am visiting? Thoughts?

Douglas Reay
douglasr (at) chiark.greenend.org.uk
Saturday, April 1st, 2006
11:40 pm
"Reputation budget" standard document format proposal
Here is a first-draft of a proposal for a distributed reputation document standard. Not a specific trust metric, but just a proposal that everyone chooses a standard document format for posting lists of:

* who they trust (/repute/certify etc)
* in what WAY they trust them (I certify this person as a friend/i certify this person as a trustworthy person to do business with/i certify this person as someone who knows a lot about ancient history, etc)
* and how MUCH they trust them.

In addition, each document cites a URI that specifies the "reputation type", which means that context of social norms/internet standards in which the document is meant to be interpreted. Each different "reputation type" is in essence its own internet standard.

I am not aware of any other generic distributed reputation list document standard proposals, but I don't know much about this stuff so please tell me if you know of any. Two proposed extensions to FOAF that cover atomic certifications (not lists of certifications) are 1 and 2.

The idea is that once people put up documents like this, others will be able to spider them, just as "social search engines" can spider FOAF files now.

more details on the "reputation budget" proposal at CommunityWiki

What do you think?
11:34 pm
Trust metrics wiki
I see that this has been linked only indirectly from here, so I thought people in this community would like to know about Trust Metrics Wiki.
Tuesday, March 28th, 2006
2:01 pm
TrustFlow frequently asked questions

TrustFlow for LiveJournal

Frequently asked questions

What do the results mean?
TrustFlow is making a guess at who is "near" your friends list; who might be on it, but isn't. It does this by looking at your friends list, and the friends list of your friends, and so on.
Is this based on who reads my journal, or interests, or what?
No. TrustFlow looks only at who is on whose friends list to make the determination; no other information is taken into account. In particular, it doesn't know anything about whose journals you are actually reading, or who is reading your journal, except what friends lists tell you.
Are the people listed in any order?
Yes; the numbers are a measure of "distance", so the first person listed is "closest".
How exactly does it determine who to list? What do the numbers mean?
A description of the algorithm appeared earlier in this journal.
I get an error!
Please read the error carefully before telling me about it, and please quote the error exactly - otherwise how can I do anything about it?
I've changed my friends list, but the results haven't changed
When it fetches friends data, it keeps that data for around 24 hours; if you wait about that long after a change you'll see that change reflected in the results.
It doesn't work at all - it lists my worst enemy first!
That means it's working. That person is someone quite close to your circle of friends, who you would list except that you've deliberately decided not to. It can't tell that you don't like them; all it can tell is that they are close enough to your circle of friends that you would list them if you didn't feel that way.

More will be added here as the questions are asked...

12:23 pm
The TrustFlow algorithm
TrustFlow is a "trust metric" algorithm, which uses human-generated information about trustworthiness in a human-sized community of a few hundred people to generate guesses about trustworthiness in an Internet-sized community of millions of people; this is useful in applications like ranking search hits and preventing spam and vandalism. TrustFlow is unique among "attack resistant" trust metrics in that it can load information about who trusts who incrementally as needed; this makes it well suited for distributed use.

We apply TrustFlow to LiveJournal by treating the decision to place someone on your friends list as an assertion of trust in them. This isn't 100% true and leads to some strange artifacts, but it works well enough to produce interesting results.

One user is special: the trust root is the source from which all trust emanates.

Each user starts with a virtual bucket into which "trust juice" is poured; all buckets have one litre capacity, and they all start out empty. Trust juice pours from heaven into the bucket of the trust root. After one litre has been poured, this bucket is full to overflowing; when it starts to overflow, gutters around the edge of the bucket carry the trust juice in equal quantity to the buckets of all their friends.

Suppose they have ten friends. After eleven litres have been poured, the buckets of the trust root and of all their friends will be full, and start to overflow. At this point things get interesting. Now the juice pours from heaven into the trust root's bucket, overflows into the bucket of one of their friends, and then overflows into the buckets of the trust root's friends of friends. People who are friended by many friends of friends will get more juice per second, but the friends links of people who list few friends will carry more juice than the links of those with many.

Eventually one of the buckets of the friends of friends will fill up; they then are the first people on the list that TrustFlow displays. Their score is the number of litres of trust juice that got poured into the trust root before their bucket filled up, and any more trust juice they receive overflows to be shared among their friends. And so on, until 200 more people have filled their buckets.

One final detail. Sometimes the juice hits "dead ends". If Bob and Carol are the only people on each other's friends lists, then once their buckets are both full they will have no-one to pass trust juice on to. In this instance the juice just "backs up" - no more juice flows to either of them. Each person with a full bucket shares the juice they receive evenly between all of their friends who have got someone with a non-full bucket to pass it on to, either directly or indirectly.

That's the sketch of the workings. If you want to actually implement this, or indeed any trust metric, it helps to understand a little about graph theory. TrustFlow analyzes a digraph of trust. Each arc is an assertion of trust; for example, each vertex might be a LiveJournal user, and there is an arc from A to B when A has B on their friends list. You'll also need to know what an eigenvector is; calculating the inbound flow on each node requires finding an eigenvector for a flow matrix. I use an iterative algorithm and I usually only need one iteration to update the flow; you could probably do things even more efficiently if you only recalculated the parts that needed to be recalculated.
Thursday, March 9th, 2006
4:36 pm
A trust metric enabled Wikipedia
I love the idea of a trust metric enabled Wikipedia. There's plenty of questions to chew on.

How would a trust metric enabled Wikipedia work? Obviously every user will have to list trust information about other users, but what do you do with that information?

Here's the easy way: Jimbo is the root of trust from which the trust metric runs. If your trust is above a certain threshold according to that metric, you can edit, otherwise you can't. If you don't like it, set up your own wiki.

Can we do better than that? Supposing we move away from the model in which a single version of the page is the "current" version, and old versions are there only as a historical record? That can be anything from a small step to a giant leap.

As a small step, we could allow untrusted users to edit the page, but normal Wikipedia visitors only see the version that has most recently been edited by a trusted user. Each trusted user is expected to review any untrusted edits they may implicitly be including when they edit a page, which is what users do now in any case for the most part; if they don't like the edits, they could revert back to the last trusted version, and normal users will never see the edits they rejected. This resembles the article validation proposals currently going forward on Wikipedia, but backed by a real trust metric.

Supposing we allow the article to fork? Wikipedia already "sort-of" allows forks, in that you can choose any version of an article as the basis for your next edit, but your new edit will become the current version and no metadata record of what edit you started from is kept, so you can't do useful things with the forks. Supposing we explicitly record the tree of versions? This would allow the sophisticated tools provided by any modern version control system to synthesize a new version of the article from whichever subset of the edits the user thought was appropriate, making it much easier for the trusted users to winnow the wheat from the chaff when choosing an edit to make current.

Now we approach the "giant leap". When winnowing the wheat from the chaff, I don't have to consider only the binary trusted/untrusted decision of the trust metric; I can use the gradations of trust it provides as a guide to how much good faith to assume in a given edit. Coming closer to the leap, I don't have to use Jimbo as the root of trust for these decisions - the point of a trust metric like Trustflow is that I can afford to do the trust calculations for myself.

The "giant leap" comes when we take it to the next level. Under these circumstances, is there any need for all users to agree on a single "current version"? The "default view" of Wikipedia might have Jimbo as the root of trust, but I might like to choose someone else - myself, for example. What effect would this have on NPOV disputes? Would each side of a contentious debate end up with their own persistent fork, or would a consensus version emerge? Would the good done by decentralization outweigh the bad done by automatic self-reinforcing reading bias?

Given this outline, when I go to view an article on a topic, how do I choose which version to view? What tradeoffs between newness and trust will I make? Will there be problems of information moved from one article to another "slipping through the cracks" if I trust the old version of one but the new version of another?

Should we be trying to use domain-specific information? One user may be trusted when editing on cryptography but not on animal welfare, say. Can we synthesize domains from the information available about what links to what and who is editing it? Can we capture in our trust information that people who know about crypto trust X but not people who know about animal welfare, and use that to fine-tune our trust decisions depending on the subject area of the article we're assessing it in reference to? Can we do so without anyone having to explicity identify domains?

Finally (for now), can we do all this in a distributed fashion, so that we are all hosting our own intricately interlinked and interrelated versions of Wikipedia, each drawing edits from each other but reflecting our own unique spin on the world?
Tuesday, February 28th, 2006
10:45 pm
Raph Levien wrote an interesting blog entry in December about trust metrics and Wikipedia.

Time for a trust metric enabled Wikipedia?

I see that Wikipedia is having some well-publicized troubles with vandalism and the like. This will be a somewhat bittersweet response.

The success of wikis has taken a lot of people by surprise, but I think I get it now. The essence of wiki nature is to lower the barrier to making improvements to content. The vast majority of access-controlled systems out there err strongly on the side of making it too hard. The idea of a wiki is to err on the side of making it too easy, and to lessen the pain (somewhat) of undoing the damage when that turns out to be a mistake. In cases where that doesn't work out, I think the solution is to make the decision process of whether to grant write access a bit more precise, so you can still err on the side of trusting too much, but you don't have to err quite as often or as badly.

In that regard, the trust metrics designed and implemented for Advogato are a near-perfect match for the needs of a Wikipedia-like project, but for the most part, nobody is paying much attention to my ideas. ( Read more... )

I think he's entirely right. Raph discusses some responses he got here, but I'd be interested to know what people here think!
Tuesday, June 21st, 2005
12:03 pm
I joined Outfoxed (user Id recompiler and vladg).
One of my biggest pet peeves is it that it IS centralized even though it claims not to be. By default when most people join they trust Stan (very very very bad) and Outfoxed (bad but not TOO terrible). Since default length of trust is 3 nodes and Stan and Outfoxed is 2 hops away from anyone (even unwanted and hostile parties). The guy is in a hurry to write something cool and gather data for his masters project and he fucked up. There may have been a lot of growth on the system but the trust matrix is broken from the very start (vs how PGP key signing works). One thing they can try to do to fix it is in a month or so break the trust relationship between Stan and everyone that's not actually his friend and hope people started trusting other users on the system. One fun thing I found is rating the trust of processes on your computer. Out of boredom I added a bunch of random processes to trusted. Then I decided to artificially try to fix it for myself by creating a 2nd account for myself on another system creating high level of trust to my primary account and to my trusted friends and then setting both accounts to ignore/not trust Stan.
I urge everyone to write to the University Stan is in and voice your concerns. There is no shortcut to good data no matter how many front page stories on slashdot you get.
Read more...Collapse )
Wednesday, August 11th, 2004
9:22 am
Hosting and other useful stuff
Around a year after the last version went live, I'm nearly ready to make a new version of TrustFlow available. This version is far faster (it uses iterative approximation eigenvector finding instead of the previous "token-passing" algorithm) and is written in Python; it also has a somewhat slicker user interface (a progress bar!). I'll make the source code available at the same time as I make it live.

I'd like to ask a couple of things of the trustmetrics community.

Hosting: I got an offer of hosting a long time ago, but I don't know if it's still extant. Can anyone offer to host this thing? You'll need a decent sized machine, because it was very popular last time and it might be again; bandwidth charges shouldn't be too bad since it's nearly all text. I'll make sure your name is in lights next to mine on the relevant pages :-)

Animated GIF: Can anyone out there design animated GIFs? I want a nice one to put on the "In Progress" page to indicate "yes, progress is being made on calculating your TrustFlow results". It should be very small, so it doesn't cost much in bandwidth to serve it. I have in mind something like a representation of a network of people, with trust pulsing along the arcs and lighting up the nodes, but feel free to design whatever you think is best.

Friday, February 27th, 2004
10:06 pm
If you scraped foaf pages, once per second, you could gather 86,400 people for day. Cache them locally. You could refresh the bulk of the active users every week. You could then do all kinds of manipulations locally, and never really be that far out of whack with LJ.
Tuesday, December 30th, 2003
10:24 pm
Status update
The LiveJournal TrustFlow system is down, and may be so for a while. It is a victim of its own popularity: it made so many requests on LiveJournal's servers for people's friends list data that it endangered the performance and stability of LiveJournal.

However, I am working with the LJ staff towards a solution which allows me to fetch the same data with much less impact on LJ's servers - this solution is known as the LiveJournal Data Protocol. Once this is ready TrustFlow will be modified to use it, and will go live once again.

Sorry for the inconvenience!

More information, as always, in the trustmetrics community.
Tuesday, September 16th, 2003
7:01 pm
Progress on getting TrustFlow working again
As I mentioned earlier, TrustFlow has been taken down because of the load it causes to LiveJournal. The good news is that LJ are proposing to add new capabilities to their software so that TrustFlow can start up again while causing less load on their servers.

For more information, see the proposed new protocol and discussions on 2003-09-09 led by ciphergoth and 2003-09-10 led by bradfitz.

This should be a big improvement for everyone who wants to write scripts that analyze LJ data!

Saturday, September 6th, 2003
10:40 am
TrustFlow shut down by LiveJournal?
I could be wrong, but it looks as if the LiveJournal staff are denying access to LJ data from the server that TrustFlow uses. This will probably be because it was causing undue strain on the LJ servers, but it could be for other reasons.

I'm sure you're as disappointed as I am, but in truth I have the greatest faith in the people who run LJ and I'm sure that if they have done this on purpose it will be for a good reason.

I'll investigate further, but for the moment TrustFlow is out of action. Sorry!

Update: I just got a very positive email from Brad. They did block it, but they're interested in working out how TF and LiveJournal can be fixed so TF's impact on LJ is greatly reduced. Once again, Brad rocks!

The latest news about how work is going will be posted to the trustmetrics community.
Monday, August 18th, 2003
12:25 pm
Slashdot, and more Bram
TrustFlow for LiveJournal has so for had nearly 70,000 different users, and over 0,6M hits. Users curemytragedy and seph were so curious to know their rankings that they reloaded the page, respectively, 595 and 817 times.

Ask Slashdot | Distributed Trust Metrics asks whether there could be a trust metric that works across websites. The question is vague and I don't think the questioner is all that clear on the ideas behind trust metrics, but if you assume he wants attack resistance, doesn't want a Central Authority of Trust (ie each site can choose its own trust root) and each user should register only once, then I think something like TrustFlow quickly becomes the only option.

More Bram Cohen musings. I don't understand his latest proposal - if everyone devotes all of their water to filling one person's bucket, that person's bucket will fill at rate 1, so long as they are reachable.
You can subscribe to Bram's diary with an RSS feed: bram_advogato. Raph Levien, Advogato founder and trust metric pioneer, is here as raph but his main diary is his Advogato one, available as raph_advogato. Aside: I hate the Advogato diary system, you can't comment on other people's entries and there's no "earlier" button on the diary aggregation page. So there's no good way to track who's talking about what you write. I hope Bram is reading this is all.

Another aside: I think the best possible way to choose multiple winners is to let voters rank entire combinations of winners, and use a Condorcet method to choose the winning combination - or to allow them to give scores to individual candidates and infer a ranking of combinations from that. Computationally expensive though...
Friday, August 15th, 2003
11:26 am
TrustFlow FAQ

Updated 2006-04-10: this is about the old version of TrustFlow, which is no longer in operation. See the FAQ for the new version.

Frequently asked questions about TrustFlow for LiveJournal. Please note that I will be very rude to anyone who asks a question answered here so please read carefully before posting.

It doesn't work for me !

It doesn't work for me - it keeps saying it's overloaded
That's because it's overloaded. I'm sorry, there's nothing I can do about it - you'll just have to keep trying.
But it works consistently for my friend!
That's because once it's worked out the list for your friend, it keeps the results for several hours so it doesn't have to re-do the calculation.
It says I have no friends listed, but I have very many!
If you have enough friends listed, LiveJournal doesn't put them on your friends page directly. If that happens, my script won't pick them up - it'll treat it as though you have no friends. I don't plan to fix this (a) because it's work, and (b) because such people will load the computer too much. Sorry.
I get an error!
Please read the error carefully before telling me about it. If the error is "it's overloaded", see above. Otherwise, please quote the error exactly - otherwise how can I do anything about it?

What is it, anyway?

What do the results mean?
TrustFlow is making a guess at who is "near" your friends list; who might be on it, but isn't. It does this by looking at your friends list, and the friends list of your friends, and so on.
Is this based on who reads my journal, or interests, or what?
No. TrustFlow looks only at who is on whose friends list to make the determination; no other information is taken into account. In particular, it doesn't know anything about whose journals you are actually reading, or who is reading your journal, except what friends lists tell you.
Are the people listed in any order?
Yes; the first person listed is "closest".
How exactly does it determine who to list?
A description of the algorithm appeared earlier in this journal.
It doesn't work at all - it lists my worst enemy first!
That means it's working. That person is someone quite close to your circle of friends, who you would list except that you've deliberately decided not to. It can't tell that you don't like them; all it can tell is that they are close enough to your circle of friends that you would list them if you didn't feel that way.
It's almost a copy of the friends list of one of my friends!
Does that person have relatively few friends listed? They get extra influence on who it lists as a result.

Use the source, Luke!

Is source code available?
Source code is linked from here.
Can I make a mirror?
Please do, and please publicise it here - thanks! Note that you'll need a Unix system, and you'll need to be something of a Perl hacker to make it go. Also you may get more hits than you expect. However, I'll be glad to help you set it up.
You could improve the algorithm if you...
Feel free to pick up the code and play with it. For me LJ is only an example dataset, so I'm not interested in LJ-specific tweaks. I also want to be able to prove good things about it, like attack resistance, so any changes have to be meet that criterion. Finally the algorithm is conceptually very simple, and I'm keen to keep it that way.

Irritations and niggles

It lists deleted journals and communities without marking them as such
You're right, and this should be fixed. However I shan't make time to fix it now. If you want to fix it, let me know, and I'll discuss what a clean fix might look like. It's not too hard but there are issues.
I changed my friends list but it hasn't taken the change into account
It can take up to 30 hours to take these changes into account. Sorry!
Can you make it work for DeadJournal/uJournal/etc?
I'm having enough trouble supporting LJ users. However, it would be pretty easy for someone to make a mirror that worked on DJ or whatever.</strong>

Please feel free to post any questions not already answered here - thanks!

11:12 am
More from Bram
More interesting writing from Bram Cohen, discussing adding negative certs to my metric. My instinct is that negative certs are hard to get right but would make it a lot more accurate and useful.
Thursday, August 14th, 2003
8:39 am
Been looking at Bram's proposal, trying to understand it. Here it is in two forms: a new, object-oriented implementation, and his original one commented.

I could have got either of these completely wrong.
class Person:
    def __init__(self, name):
        self.name = name
        self.arcs = []
        self.already_selected = False

    def connect(self, people):
        self.arcs += [[0, p] for p in people]

    def _rank_single(self, passed):
        if passed.has_key(self):
            return None
        passed[self] = True
        if not self.already_selected:
            return self
        arcs = self.arcs[:]
        for arc in arcs:
            result = arc[1]._rank_single(passed)
            if result is not None:
                arc[0] += 1
                return result
        return None

    def ranks(self):
        result = []
        while True:
            next = self._rank_single({})
            if next is None:
                return result
            next.already_selected = True
Read more...Collapse )
Wednesday, August 13th, 2003
7:23 pm
One other thing:
Sorry! The load on this computer is too great to calculate the results for username. It's only a very little computer and many people are trying to use at the moment. I'm afraid there's nothing I can do to fix this, so please just keep trying.
Why do people keep telling me that it's not working for them and quoting this message? Have they tried reading it?

How in the name of God could this message be any clearer? It's a PII 350 with 64Mb of RAM for fuck's sake, of course it can't handle everyone on LJ hammering on it. It's limited to doing 5 trust metric calculations at once, and refuses any further requests. Note that once it has the results for someone, it will store them for three hours, so that's why it seems to work consistently for some people.

If you want to fix this situation, download the source and host a mirror. There is nothing I can do about it.
7:09 pm
A quick explanation of TrustFlow
TrustFlow does not look at interests, who reads your journal, or any other such thing. It looks only at "friends" lists. It's trying to determine who is closest to who based on who lists who as a friend.

Imagine this. Everyone on LJ has a bucket that can hold 500 tokens. We start off with every bucket empty. The order in which TrustFlow lists people is the order in which each person manages to fill their own bucket.

You are the source of tokens; you get tokens one at a time. First, you put them into your own bucket, until that bucket is full. Bang, you are top of your own list. You then give more tokens to your friends, in strict rotation; imagine a parent handing out a bag of sweets to kids "one for you, and one for you and one for you, and a second for you and a second for you...". Each of your friends places the token in their own bucket. Finally, after you've given out 500 tokens for each of your friends, all their buckets fill one after another, and they all join the list.

Since all that is totally predictable, we don't bother to list you or your friends, because it's after that that things get interesting.

You get a token. Since your bucket is full, you pass it to a friend. But their bucket is full too, so they pass it to a friend in turn; like you, they issue the tokens they get to their friends in strict rotation. Each token ends up in the bucket of someone who isn't on the list. Keep going like this, and eventually someone's bucket will fill up. They become the next person to join the list. From then on they, like you and your friends, pass on any more tokens they get to their friends. Keep going until we have 50 full buckets, and list the order the buckets fill up.

That's basically it. There's a slight complication to do with dead ends but that doesn't matter much for the basic understanding.

Here's some of the consequences.
  • Supposing Alice and Bob are on your friends list. Alice has two friends; Bob has 20. All other things being equal, each of Alice's friends gets 10 tokens for every token one of Bob's friends get, because Alice and Bob are receiving just as many tokens but Alice is sharing them between fewer people.
  • If many of your friends list me, I will get more tokens.
  • If you list Alice and Bob, and Alice lists Bob, then in addition to the tokens he gets from you, Bob will sometimes get tokens from Alice. As a consequence Bob's friends get a few more tokens.
  • Who your friends are matter only once you get on the list. It doesn't matter if my friends list is just the same as yours, it doesn't affect my ranking at all; my friends list matters only once my ranking is decided.
  • The "Trust" in "TrustFlow" is there because it's really meant for a different situation - one where you list people to indicate that you trust them. That's emphatically not the case on LiveJournal - LJ was just a handy platform on which to try the ideas out. It doesn't measure trust here, more the sorts of qualities which lead to putting someone on your friends list, which is generally acquaintance and interestingness.
  • The "Flow" is there because the trust starts from you, and "flows out" down the links to your friends, and then onto their friends.
I hope that helps answer some of the questions people have had!
Saturday, August 9th, 2003
7:29 pm
Bram's new twist
Bram Cohen discusses my metric in his journal, including proposing a modification. I haven't had time to analyse this in detail yet, but it seems interesting!
[ << Previous 20 ]
About LiveJournal.com