Google's AdSense Algorithm

Published 11.26.2005 by jm | E-mail this post

Google recently filed a patent for user targeted, or attention targeted, search results which will change the ranking of Google's organic results per each individual user based upon that user's search behavior, location, sites visited, and even 'typing behavior'. How could Google build such user profiles to serve customized organic (non-paid) results to? Tracking via their network of desktop apps, advertising, Gmail, and other network services.

I relate this to help explain Google's AdSense algorithm that prevents cheating. I get my information from several trial and error accounts, and through articles like the ones above. Before I get to all that, let's break down the actual click-through. First, right click a google link and copy it. Paste it into notepad, and you will see a long (200-ish character) link. This link contains all the juicy information Google uses to verify fraud click throughs.

The first variable of note is "ai". If you copy the link and past it, you will see "&ai=", followed by a hundred or so characters. This is what you should be concerned with. "AI" may stand for something in the google universe, but for all intents and purposes, we can think of it as artificial intelligence, as it is how Google instantly evaluates hundreds of thousands of clicks without human interaction. This variable, AI, is broken down further into two halves.

The first half is the user's (the person clicking through) information. It is an encrypted message that says who you are. Variables like time are utilized, so that it does not come up the same way very often. The randomness is an attempt to prevent exploiters from figuring out the algorithm. The first half of the AI variable insures an AdSense participant is not clicking over and over to generate more revenue.

The second half is the person you are sending clicks to. This is so google knows who to charge, and who they are paying. The second half is usually very much more similar. When hitting the reload button on my browser 10 times, the same second half showed up seven times. In my case, the first two letters of the second half was always "8B" or "wG," showing there is defintily a similarity between these, even if the rest of the numbers are differnt. This proves the algorithm is in fact not very hard to break. I am sure a code breaker could do this in a matter of minutes, and a hobbiest like you or me could churn the algorihtm out in several hours. I don't have the pacience for this, but just after studying for ten minutes, you can figure out how to break down the AI variable.

Here's the kicker. The first half and second half may not be apparently seperatedImagin a sentence that runs together without periods or spaces. Occasionally you will have "____" seperating the two parts. Othertimes, the only way you can see the two halves is by noticing at one point, the characters in two copy/pasted links (seperated by reloading the browser) start to echo each other.

This is just half of it. After a valid click is made, a responce from the server gets sent back to your computer to verify all the values above. To this end, if you copy the link, and send it to someone else to click on, Google will realize that it is a fradulent link because the ebedded IP address, and the address of the clicker, don't match.

Okay, now that they have verified your IP, next comes the big scary deal. Their automated algorythm knows who you are based on webpages you have visited bearing the Google mark (blogger, google maps, etc etc etc etc). The machine that tries to collaborate user targeted search results is grabbing your IP address whenever it can. Now, this isn't a conspiracy-theory motivated remark. This is all automated and not monitored by a human. Fine tuning may be monitored by a human, but imagin the sheer amount of IP addresses they manage.

Combine that with AdSense. They know if you have clicked 40 links, as they have recorded your IP address for fraud reasons. If you click ten links in a row at the same site, they will discount several of those clicks based on the fuzzy logic above. If you own the AdSense account, and you are clicking your own links, they only allow the first click to count towards any statistics.

Now, if you were banned for fraud clicks and try signing up with new information, even if all the information you provide is new (brother's SSN, friend's address, et cetera) they will know if you are the same person, if you log into Blogger with the same account, for instance.

Google has an automated version of the 1984 monitoring theory. That may be a bad comparison because it is user-participated. You can very well vie out of being monitored by not utilizing Google. This system isn't meant to intrude but is meant to protect the advitisers. I must say though, they know exactly what you are doing, if you are on the internet.

This is why judges are debating subeonaing Google whenever a murder suspect is on trial. As slashdot says, the details are a bit scant, but it seems that the content of Google searches were used to help establish intent in a murder trial. Police in the future may simply serve a subpoena to Google to find out what you've been thinking about. While this use of that information makes sense, at what point does your privacy give way to public concerns? Should police be able to search through your search history for "questionable" searches before you've been arrested for a crime, and what effect would this have on the health of society?"

I stopped using AdSense for this very reason. Scary stuff

If the human body was never exposed to ailments, it would be impressivly vulnerable to the slightest cold. If our country was never exposed to hacking, it would be oppressivly vulnerable to cyber terrorism. With out the creation of a malicious hacking, Afganistan could have destroyed America's economy with a ping flood. This is why I encourange maclicious hacking, as an ethical practice. Without strengthening our defenses, we are weak. This site is focused on security through knowledge. I detest the fact that so many companies are being exploited because malicious hackers know their security holes before they do. For that reason, I hope to educate where the exploits lay. This isn't a 100% information base, as I only publish things I have been able to implement on myself. No credit is needed anywhere . However if you are a publisher, I would appriciate credit. I am an advocate of open source, so copy and paste and call it your own if you like. If my work is good enough for you to plagerize then that is my biggest compliment . If my work is good enough, I will be approached and asked to write more ... this is natural selection of the digital age .

Previous hacks

Previous Hacks

This link kills spam

spam IP addresses

Cost of the War in Iraq

(JavaScript Error)

Two very recommended books:

. . The only hacking forum I have found worth mentioning here