Wednesday, March 21, 2012

Google PageRank Equation Derivation


We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:


PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.

This is taken from Google's original paper: The Anatomy of a Large-Scale Hypertextual Web Search Engine. They later go on to give an intuitive explanation of the formula:

PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.


So how does one get from their intuition to the formula? From their description, a user gets to A by either randomly choosing A from anywhere, or clicking a link to A from pages T1...Tn. We can state that as:
Probability of being at page A = Probability of: being anywhere and randomly choosing A, or being at page T1 and clicking link to A, or being at T2 and clicking link to A, ..., or being at Tn and clicking link to A.

Probability theory states that if you have two situations where either one or the other occurs, then the probability of either occurring is the sum of the probability of each occurring:

Pr(A or B occurring) = Pr(A occurring) + Pr(B occurring)

Also, the probability of both situations occurring is the product of each individually occurring:

Pr(A and B occurring) = Pr(A occurring) * Pr(B occurring)

Now we are ready to build the equation:
  1. Pr(A) = Pr(randomly clicking and reaching A or not randomly clicking and reaching A)
  2. = Pr(randomly clicking and reaching A) + Pr(not randomly clicking and reaching A)
  3. = 1-d + d * Pr(reaching A from the current page without randomly clicking)
  4. = 1-d + d * Pr(being at page T1 and clicking link to A, or being at T2 and clicking link to A, ..., or being at Tn and clicking link to A)
  5. = 1-d + d ( Pr(being at page T1 and clicking link to A) + Pr(being at T2 and clicking link to A) + ... + Pr(being at Tn and clicking link to A) )
  6. = 1-d + d ( Pr(being at page T1) * Pr(clicking link to A from T1) + Pr(being at page T2) * Pr(clicking link to A from T2) + ... + Pr(being at page Tn) * Pr(clicking link to A from Tn) )
  7. = 1-d + d ( Pr(T1)/C(T1) + Pr(T2)/C(T2) + ... + Pr(Tn)/C(Tn))

Tuesday, March 20, 2012

Make a Home Entertainment System Out Of Your Old Laptop

If you are like me, you don’t like throwing away old electronics, especially things that cost you a pretty penny in the past. You may not realize it, but your old laptop (or desktop) is ready to be changed into an extra TV set for little or no cost. Forget about having some special gizmo to left you search the internet with your television. There is no better way to search that internet than with a computer. I haven’t purchased a television in years actually, and I don’t plan to.
First off, your laptop is pretty much so ready as is to do this. Most relatively recent laptops have wireless cards in them, so you can either run an Ethernet cord from your router, or better, with the wireless run your mobile TV set anywhere in the house. You can use your laptop to play DVDs as it most likely has a built in DVD drive.
Where I watch the majority of my media on though is Netflix streaming. This pretty much enables my box into being a movie watching machine. Now, you can’t view a laptop screen from across the room, but here is where you can probably reuse something else you have no use for, an old monitor. Connect your monitor up to the VGA port and switch your laptop to duplicate the screen on both its display and the external monitor. I usually place the laptop behind the monitor, turned to the side, so I don’t see the same image twice.

For sound, you can buy a very cheap set of external speakers. I use Logitech’s Z5 Multimedia Speakers. What you probably most will want though is a wireless remote for sound. I chose the Logitech set specifically because it has a wireless remote. However, if your old laptop happens to be a MacBook (as mine is) it also comes with a remote control. Another option is that if you have a smart phone of some sort, they tend to have free apps that allow you to remote into your computer. If your phone is on WiFi then you can actually control the desktop of your laptop from the couch.
As an added bonus to all this, with ITunes installed or Windows Media Player, you can turn this setup into a remote music player. Again, smart phones tend to have apps for controlling remote media from a distance. Now, this doesn’t cover television itself, but, this is the internet generation, who needs that sort of old-fashioned media.

Enjoy!

Monday, March 19, 2012

How Does PageRank Really Work?

There is a lot of mysticism on the web about what page rank really is. Let's take a look behind the scenes of Google to see how they actually do it. How do we obtain these secrets? We simply have to do a little reading of the original paper Google's founders wrote on the subject. Here is a snippet from the paper that really gets the idea down:
PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page.
Even that is a bit rocket-sciency but we'll break it up to see what it means. What they are trying to explain here is how does a person find your website at all? PageRank says that there are two possibilities:
  1. Someone can directly type the url into the address bar at the top of your browser - best of luck arranging for that to happen!
  2. Someone can arrive at your page through a link from another page.
Incidentally, Google itself is a web page, so just being listed by Google increases your PageRank, although by just a little, we will see why.

Let's forget about option one and concentrate on the second one.
The probability that the random surfer visits a page is its PageRank.
In other words, your PageRank is how likely it is that someone will reach your webpage if they were just randomly clicking on web page links (essentially a Monkey at a typewriter). Say your visitor started at an incredibly popular website that linked directly to your article. I'm sure you understand that would be very good for you because people can find you easily and most likely will. Why? Because many people go the original website and since there is a link there to you, they may very well come to your site. No rocket science there. This is what PageRank does its best to calculate.

So to get extra PageRank you need to have popular sites linking in to you. You probably knew that before reading this. What might not be so clear is that the more links there are on the site linking to you, the less quality that link is for improving your PageRank. That monkey randomly clicking links has a smaller chance of clicking through to you if there are more links. A prime example here is Google itself. They basically point to every page on the web, so even though they have an enormous PageRank, their link to you is essentially worthless. Now we see why being listed by Google will not significantly increase your PageRank.

The best situation is to have an incredibly popular site point to you and be pointed to by the ONLY link on their site. Fat Chance! This of course means when you are the owner of high PageRank site you have something valuable to offer, namely who you link to. Google did a good job with this one, that makes sense too.

But this is only one half of the story... what can you do with a site to increase your own PageRank. It come back to that explanation from the original PageRank paper.
The probability that the random surfer visits a page is its PageRank.
Now its true that linking out to a random place doesn't directly affect your PageRank. The PageRank is the chance that someone comes to your website, not leaves it. But there is an indirect effect. If someone left your site, wouldn't you want them to come back? Remember we are talking about someone randomly clicking on links here. If you are going to have an outgoing link, the best place to send the random surfer is to another site that is both popular and points back to you!. This is the idea behind BackLinking. But beware, this doesn't actually increase your PageRank, its just the most effective way to keep your PageRank high when having outgoing links.

Why? Here are some scenarios to think about
  1. A high page rank site points to you and you don't point back.
  2. A high page rank site points to you and you do point back.
  3. You point to a high page rank site, then they point to you.
In the first situation, from PageRank's point of view, once someone randomly comes to your site they don't leave. In the second one, they come to your site, you send them back from whence they came, but they might not come back to you at all. The second one is worse than the first. PageRank is the chance that someone is at your site, not someone else's. The third one is the common practice. This way you build your networking and might get the attention of another site, they then link to you. This actually puts you in situation two, it doesn't matter the order of who linked to who as far as PageRank is concerned. Of course, you could then delete your link to them and get into situation one... if you so choose.