2005-07-21
With the emergence of Napster in the fall of 1999, peer to peer (P2P) applications and their user base have grown rapidly in the Internet community. With the popularity of P2P and the bandwidth it consume, there is a growing need to identify P2P users within the network traffic.
In this paper the author will propose a new method based on traffic behavior that helps identify P2P users, and even helps to distinguish what type of P2P applications are being used.
Current Technology
When it comes to identifying P2P users, currently there are only two choices: port based analysis and protocol analysis. Here is a brief review of both.Port based analysis
Port based analysis is the most basic and straightforward method to detect P2P users in network traffic. It is based on the simple concept that many P2P applications have default ports on which they function. When these applications are run, they use these ports to communicate with outside. The following is a example list:
Morpheus 6346/6347 TCP/UDP
BearShare default 6346 TCP/UDP
Edonkey 4662/TCP
EMule 4662/TCP 4672/UDP
Bittorrent 6881-6889 TCP/UDP
WinMx 6699/TCP 6257/UDP
To perform port based analysis, administrators just need to observe the network traffic and check whether there are connection records using these ports. If a match is found, it may indicate a P2P activity. Port based analysis is almost the only choice for network administrators who don't have special software or hardware (such as an IDS) to monitor traffic.
Port matching is very simple in practice, but its limitations are obvious. Most P2P applications allow users to change the default port numbers by manually selecting whatever port(s) they like. Additionally, many newer P2P applications are more inclined to use random ports, thus making the ports unpredictable. Also there is a trend for P2P applications begin to masquerade their function ports within well-known application ports such as port 80. All these issues make port based analysis less effective.
Protocol analysis
Despite the poor results found using simple port matching, an administrator has another choice: application layer protocol analysis.With this approach, an application or piece of equipment monitors traffic passing through the network and inspects the data payload of the packets according to some previously defined P2P application signatures. Many of today's commercial and open source P2P application identification solutions are based on this approach, and include the L7-filter, Cisco's PDML, Juniper's netscreen-IDP, Alteon Application Switches, Microsoft common application signatures, and NetScout. They each do their detection work by doing regular expression matches on the application layer data, in order to determine whether a special P2P application is being used.
Because protocol analysis focuses on the packet payload and raises alerts only on a definite match, any client-side tricks that use non-default or dynamic ports to avoid detection by P2P applications will fail. Using this approach, the result is normally more accurate and believable, but it still has some shortcomings. Here are some points to remember with protocol analysis of P2P networks:
- P2P applications are evolving continuously, and therefore signatures can change. Static signature based matching requires new signatures to be effective when these changes occur.
- With more and more P2P identification and control products on the market, P2P developers tend to tunnel around any controls placed in their way. They could easily achieve this by encrypting the traffic, such as by using SSL, making protocol analysis much more difficult.
- Signature-based identification means that the product should read and process all network traffic, which brings up the issue of how to maintain network stability in a large network. The product may burden network equipment heavily or even cause network failures. If it works inline, what will you do when the product fails?
- Signature-based identification at the application level (L7) is also highly resource- intensive. The higher bandwidth network, the more cost and resources you need to inspect it. Suppose you inspect a 1Gbit or even 10Gbit network link, how much investment must you make to get an appropriate product?
Most importantly, if your organization cannot afford the special appliances or applications that perform protocol analysis, is port matching your only alternative? Fortunately, the answer is no. An approach based on traffic behavior patterns proves to be both functional and cost-effective.
