2006-11-27
Introduction
Web 2.0 applications are a combination of several technologies such as Asynchronous JavaScript and XML (AJAX), Flash, JavaScript Object Notation (JSON), Simple Object Access Protocol (SOAP), Representational State Transfer (REST). All these technologies, along with cross-domain information access, contribute to the complexity of the application. We are seeing a shift towards empowerment of an end-user's browser by loading libraries.
All these changes mean new scanning challenges for tools and professionals. The key learning objectives of this article are to understand the following concepts and techniques:
- Scanning complexity and challenges in new generation Web applications
- Web 2.0 client-side scanning objectives and methodology
- Web 2.0 vulnerability detection (XSS in RSS feeds)
- Cross-domain injection with JSON
- Countermeasures and defense through browser-side filtering
Web 2.0 scanning complexities
The next generation Web 2.0 applications are very complex in nature and throw up new scanning challenges. The complexities can be attributed to the following factors:
- Rich client interface - AJAX and Flash provide rich interfaces to applications with complex JavaScripts and Actionscripts, making it difficult to identify application logic and critical resources buried in these scripts.
- Information sources - Applications are consuming information from various sources and building up mashups [ref 1] within sites. An application aggregates RSS feeds or blogs from different locations and builds a large repository of information at a single location.
- Data structures - Exchange of data between applications is done using XML, JSON [ref 2], Java script arrays and proprietary structures.
- Protocols - Aside from the simple HTTP GET and POST, applications can choose from an array of different protocols such as SOAP, REST and XML-RPC.
Our target application may be accessing RSS feeds from multiple sites, exchanging information with blogs using JSON, and communicating with a stock exchange portal's Web service over SOAP. All these services are bundled in the form of Rich Internet Applications (RIA) using AJAX and/or Flash.
Web 2.0 application scanning challenges
Application scanning challenges can be divided into two parts:
- Scanning server-side application components - One
of the biggest challenges when scanning Web 2.0 applications is to
identify buried resources on the server. When scanning traditional
applications, a crawler can be run that would look for the string
"href" in order to identify and profile Web application
assets.
In the case of Web 2.0 applications, however, one needs to identify backend Web services, third-party mashup, backend proxies, etc. The author has addressed some of these challenges in a previous article [ref 3]. - Scanning client-side application components - A Web 2.0 application can loads several JavaScripts, Flash components, and widgets in the browser. These scripts and components utilize the XMLHTTPRequest object to communicate with the backend Web server. It is also possible to access cross-domain information from within the browser itself. Cross-site scripting (XSS) attacks [ref 4] are potential threats to the application user. The Web 2.0 framework uses various client-side scripts and consumes information from "untrusted third-party sources." AJAX and JSON technologies, cross-domain access and dynamic DOM manipulation techniques are adding new dimensions to old XSS attacks [ref 5]. Client-side component scanning and vulnerability detection in Web 2.0 are new challenges coming up on the horizon. The scope of this article is restricted to this scanning category.
Client-side scanning objectives
To understand these scanning objectives clearly, let us take a sample scenario as illustrated in Figure 1.0. We have a Web application running on example.com. Clients access this application via a Web browser.
Figure 1. Web 2.0 target application layout.
This web application can be divided into the following sections with regard to their usage and logic.
Application resources- these resources are deployed by example.com and they can be of any type: HTML, ASP/JSP, Web services. All these resources are in a fully trusted domain and are owned by example.com.
Feed proxy - The XMLHTTPRequest object cannot make direct backend calls to cross-domains. To circumvent this restriction a proxy is set up by example.comthat can give access to third-party RSS feeds, for example, a daily news feed. Hence, users of example.com can set up any feed on the Internet for daily use.
Blog access - End-users use the same application loaded by example.com to access some of the blogs on the Internet. This is possible because example.com loads certain scripts on the client's browser that allow users to access cross-domain blogs.
Here are four critical scanning objectives to determine client-side vulnerabilities.
- Technology and library fingerprinting - Web 2.0 applications can be created by many AJAX and Flash libraries. These libraries get loaded in the browser and are used by the application as and when needed. It is important to fingerprint these libraries and map them to publicly known vulnerabilities.
- Third-party untrusted information points - In Figure 1, we have divided the Web application layout into "trusted" and "untrusted" areas. Information originating from untrusted sources needs thorough scrutiny prior to loading it in the browser. In our example this information flows via an application server proxy in the case of news feeds, and directly into the DOM in the case of blogs.
- DOM access points - The browser runs everything in its DOM context. Loaded JavaScripts manipulate the DOM. If malicious information is passed to any one of these access points the browser can be at risk. DOM access points are therefore essential bits of information.
- Functions and variable traces for vulnerability detection - Once DOM access points and third-party information has been identified, it is important to understand execution logic and corresponding traces in the browser in order to expose threats and vulnerabilities.
Scanning client-side applications [news feeds]
In this section we shall adopt a manual approach to the scanning process. This methodology can be automated to some extent but given the complexity of the application it may be difficult to scan for all possible combinations.
The target resource -http://example.com/rss/news.aspx
We get the following page, as shown below in Figure 2.
Figure 2. RSS feed application widget.
The above page serves various RSS feeds configured by the end-user. Now let's walk through the steps we require for scanning.
1. Scanning for technology and fingerprints
All possible JavaScripts consumed by the browser after loading the page can be grabbed from the HTML page itself by viewing the HTML source, as listed in Figure 3, or programmatically using regular expressions.
Figure 3. All JavaScripts for the application page.
If you have the Firefox plugin "Web Developer" [ref 6], you can view all scripts in a single page as shown below in Figure 4.
Figure 4. All JavaScripts along with source code.
The following information can be identified by scanning these JavaScripts:
- One of the AJAX development toolkits dojo.js [ref 7] is being used. File names provide vital clues when fingerprinting these technologies. We can scan the content further to determine the version in use. A similar technique can also be employed to fingerprint Microsoft's Atlas and many other technologies. This information helps in mapping known vulnerabilities to underlying architecture.
- Files containing functions that are consumed by an RSS feed
application can be mapped within the browser. Here is a brief
list:
- The rss_xml_parser.js file contains functions such as
processRSS()andGetRSS(). These functions fetch RSS feeds from the server and process them. - The XMLHTTPReq.jsfile contains
makeGET()andmakePOST()functions to process AJAX requests. - The dojo.js file contains several other functions.
- The rss_xml_parser.js file contains functions such as
All this enumerated information can be organized to obtain a better picture of the process.
2. Third party untrusted information points
We scan the HTML source for the page and locate the following code:
This code calls the function GetRSS(), which in turn, makes a
request to the proxy to fetch untrusted RSS feeds from various
servers.
Continued on page 2...
[ref 1] Brief on mashup (http://en.wikipedia.org/wiki/Mashup_(web_application_hybrid))
[ref 2] JavaScript Object Notation (JSON) is a lightweight
data-interchange format (http://www.json.org/)
[ref 3] Hacking Web 2.0 Applications with Firefox (http://www.securityfocus.com/infocus/1879)
[ref 4] XSS threat classification (
http://www.webappsec.org/projects/threat/classes/cross-site_scripting.shtml)
[ref 5] DOM Based Cross Site Scripting or XSS of the Third Kind
- By Amit Klein(http://www.webappsec.org/projects/articles/071105.shtml)
[ref 6] Web developer plugin (http://chrispederick.com/work/webdeveloper/)
[ref 7] Dojo toolkit (http://www.dojotoolkit.com/)
