Subscribe via RSS ... Subscribe via e-mail ... Follow me on Twitter ... Follow me on Facebook ...

The man in the middle

The last weeks I was tinkering around on an old HTTP proxy skript I wrote about one year ago. This script doesn’t contain any rocket science skills and you have the same or probably even more functionality with any other HTTP proxy. Implementing the server in PERL allows me to extend, modify and adjust it according the required needs. I wanted to analyse the traffic caused by people who want to be anonymised and are sitting behind an identity obscuring proxy server, to find out what they are (bots, scripts, humans), what they do and why they want to obscure their identity.
In this post you find an houerly updated statistic from the data collected during two days and some addintional info about what this statistic wants to tell us.



Generated on October 13 2010 13:11:26

Total requests 1115784
 
Proxy port Total requests
8000 277183
8080 265029
3128 573559


Basic HTTP authentication

About 90% of the clients using the Basic HTTP method try to authenticate on servers with pornographic content. And most of these authentication requests belong to a login hacking attack and don’t contain valid user credentials.

Among all these login hack requests we find also successful login attemps. Mostly these authentication requests were typed in by humans and not by scripts and they didn’t authenticate on a porn server. If we filter out all these login hacking attempts we get a hand full of valid user accounts.

Requests URL
1570 www.fetishliza.com
1478 members.teamskeet.com
1116 www.southern-charms3.com
611 sexstationtv.com
516 members.korny.adultbouncer.com
509 southeastsoles.com
449 nudesandnature.com
449 strapon-hell.com
388 www.humiliatrix.com
339 www.young-goddess.com
239 members.glamour.cz


HTML GET authentication

With the GET login requests we encounter a similar situation as with the Basic HTTP authentication. Most of the requests belong to login hacking attempts. Many of these attempts are executed on yahoo servers as they probably don’t identify automated login atempts as Google does. If you browse through the logs and ignore the sites with more than 2 or 3 requests chances are good you find valid requests typed by a humans.

Requests URL
928 195.122.131.36
178 one-cpm.fr.nf
169 195.122.131.24
158 n4.login.re3.yahoo.com
132 login.korea.yahoo.com
117 195.122.131.30
102 l10.member.sp1.yahoo.com
101 login.india.yahoo.com
99 login.vip.kr3.yahoo.com
97 l16.member.sg1.yahoo.com
96 l09.member.tw1.yahoo.com


HTML POST authentication

The POST requests don’t really differ from the GET login requests. Ignore the sites with many login atempts and focus on the others with only a few requests. Also here you will probably stumble on valid user account data.

Requests URL
2312 209.222.7.232
1087 174.140.154.23
718 209.222.7.235
580 hotfile.com
522 megaporn.com
496 79.143.184.247
372 209.222.148.141
327 174.140.154.12
165 174.140.154.18
147 174.140.154.14
106 m.upcoming.yahoo.com


Most active clients

We have not yet linked the clients to the servers or URLs and a reverse lookup of a client is mostly not possible. With help of a WhoIs lookup we can at least find out the clients country code and determine which countries have the most actives clients.

Requests URL (Country code)
13228 216.245.196.122 (US)
9507 109.87.45.228 ()
8791 109.86.246.136 ()
8349 208.115.219.10 (US)
8278 74.63.192.66 (US)
6032 173.203.240.43 ()
5924 81.24.89.14 (ru)
4247 89.250.157.196 (RU)
3887 221.233.192.72 (CN)
3783 86.62.248.210 (qa)
3582 91.207.6.26 (UA)


Most requested servers

Looking at the servers hostname we can estimate what function a server may has. Considering our top 10 list it is not the typical stuff like mail or news people want to get while sitting behind a anonymising proxy. Instead advertisement seems to be the main reason using an HTTP proxy. You can see as well that Google even behind a proxy is a popular server. But after evaluating the passed search strings the users rather want to check if the proxy server works properly instead of searching stuff on the net. And the wired search strings tell us that the requests were executed automatically by a script and not by humans.

Requests URL
22276 login.icq.com
17425 www.google.com
16060 ad.yieldmanager.com
14892 content.yieldmanager.com
10282 ad.reduxmedia.com
3078 home.uasar.org.ua
2835 ak1.abmr.net
2220 ad.xtendmedia.com
2176 www.adparlor.com
1995 ad.spot200.com
1972 www.besthitsnow.com


Most requested URLs by a system

When this page was created the most requested URLs were WebBugs, login hack attempts and mainly URLs to ad-servers containing either banners or javascript code that requests banners. The big picture gets clearer and we see that advertisers seem to appreciate the obscuring services of anonymising proxy servers.

Requests URL
22276 http://login.icq.com:443 …
11911 http://content.yieldmanager.com/ak/q.gif …
1901 http://snandart.com:443 …
1836 http://proxylist.co:443 …
1509 http://www.google.com/intl/de/ads/ …
1476 http://members.teamskeet.com/ …
1363 http://www.google.de/about.html …
1297 http://botmasternet.com/proxy/http/engine.php …
1286 http://www.google.com/accounts/TOS?loc=DE …
1185 http://www.google.com:443 …
910 http://flashsexclips.com/proxy5/check.php …


Most comunicating systems

This overview shows which system likes which server and how often a request was sent from one to the other. The eye-catching thing here is that the source address is mostly located in China or in the USA and the requested server hosts advertisement… images, banners, scripts, etc.

Requests Source Destination
5924 81.24.89.14 login.icq.com
4247 89.250.157.196 login.icq.com
3783 86.62.248.210 login.icq.com
3478 81.4.136.2 login.icq.com
3474 216.245.196.122 content.yieldmanager.com
3078 93.126.101.119 home.uasar.org.ua
3026 204.124.183.90 www.google.com
2917 216.245.196.122 ad.yieldmanager.com
2726 62.228.153.82 login.icq.com
2705 173.236.70.187 www.google.com
2636 74.63.192.66 ad.reduxmedia.com


Most called URLs by a system

This overview shows which system likes which URL and how often a URL on a specific server was requested by a particular client system. The situation here is the same as in the paragraph above. The client sits somewhere in the USA or China and the destination server is involved in advertisement.

 
Requests Source URL
5924 81.24.89.14 http://login.icq.com:443 …
4247 89.250.157.196 http://login.icq.com:443 …
3783 86.62.248.210 http://login.icq.com:443 …
3478 81.4.136.2 http://login.icq.com:443 …
2726 62.228.153.82 http://login.icq.com:443 …
2672 216.245.196.122 http://content.yieldmanager.com/ak/q.gif …
1836 173.234.51.29 http://proxylist.co:443 …
1568 74.63.192.66 http://content.yieldmanager.com/ak/q.gif …
1509 208.115.219.10 http://content.yieldmanager.com/ak/q.gif …
1476 187.132.45.238 http://members.teamskeet.com/ …
1238 84.19.161.108 http://snandart.com:443 …

Most called destination ports

As the proxy server supports the CONNECT method clients are allowed to establish a TCP connection to any port. CONNECT is normally used to tunnel HTTPS through a proxy server. Spamers like to use it to SMTP servers and people + bots like this method to connect to IRC servers. This is the reason why beside port 80 and 443 also other, sometimes rather exocit ports, are listed.

Requests Dest. port
1072189 80 (www)
39426 443 (https)
2730 25 (smtp)
485 6667 (ircd)
153 6112 (starcraft)
123 6668 (ircd)
120 6666 (ircd)
83 7000 (afs3-fileserver)
70 8080 (webcache)
58 33033 ()
48 81 ()
43 6669 (ircd)
29 6665 (ircd)
22 8018 ()
16 12350 ()
15 2866 ()


The bottom line

At the beginning I thought it would be easy fishing user accounts out of the data streams. But after some tests I noticed that the major part of the traffic was automated and related to advertisement in one or another way. There is not much sensitive data to catch. In a second step I tried to redirect all the clients to the Megapanzer web page to see how the traffic load changes and if some users will start browsing the page. But also this Plan didn’t work out as expected.

So obviously humans don’t like to use HTTP proxys which they have to configure somewhere in the browser properties. Either it is to complicated or there is an easier way to use a proxy as web proxies for example. You can find real user traffic but in a very low quantity. Also the Automated traffic originates often from login hacking scripts. A proxy suppressing the clients real identity makes the the attackers feel safer.

The heavy users are the advertisers. They are responsible for the major part of the requests passing the proxy and that sometimes let my inet link collapse. But for what reason actually? Why don’t they connect directly to the destination servers so they don’t rely on an instable and unreliable node in between? After pondering for a while and searching for a plausible answer the only reason I can imagine is to keep the click rate on their advertisements higher than it really is. An advertiser like xapads.com or defaultimg.com can ensure their customers a high amount of clicks and views per day what makes them as an advertisement partner more valuable. Or the customers pay these ads companies according the “Costs per impression” model. Then the clicks are generated by scripts running somewhere on a server in China or in the USA. For example if you have a list containing 1000 proxy servers and your customers pay you $20 CPM, the advertiser “could” earn this money in one day. 20$ * 30 makes 600$ a month. Serving ten customers for 30 days makes a nice amount at the end of the month.
But this is only an assumption. Any better ideas? Suggestions?

Leave a comment


But please respect the commenting rules. Critizism is appreciated and also general comments of course. If you're rude, I have to delete your comment. Also use your personal/nick name but avoid using business names. Have fun and thanks for participating the discussion.