VSTI Prajna Project

VAST 2009 Challenge
Challenge 1: - Badge and Network Traffic

Authors and Affiliations:

Edward Swing, Vision Systems & Technology, Inc.


The Prajna Project is an open-source Java toolkit designed to provide various capabilities for visualization, knowledge representation, geographic displays, semantic reasoning, and data fusion. Rather than attempt to recreate the significant capabilities provided in other tools, Prajna instead provides software bridges to incorporate other toolkits where appropriate.

For this challenge, I developed a custom application using the Prajna Project. The Prajna Project provided the utilities for reading the data files, timeline visualization components, and automated reasoning.

For this challenge, I created a number of automated reasoning tools, based on analysis of the data and problem domain. I applied automated reasoners to identify suspicious internet traffic and primary suspects. While these filters were designed specifically for this mini-challenge, the toolkit supports a full range of semantic reasoning that could be applied for an operational system.

In addition, the GUI components for filtering the displayed data have been implemented as configurable filters. These filters extend the basic reasoning and filtering capabilities within the Prajna toolkit. The timelines which show values along the vertical axis - used to plot the request sizes of the IP traffic over the time - have been added to Prajna. Prajna will incorporate several other specific features that were designed for this challenge.

The Prajna Project is a toolkit developed by Edward Swing, available at https://sourceforge.net/projects/prajna/. The custom application was built at VSTI. Other VSTI programs have since incorporated some of the new components that were developed for this contest.




MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent.

Traffic.txt File

MC1.2: Characterize the patterns of behavior of suspicious computer use.

When examining the available data, I started to consider how I might identify the suspicious behavior. Since the IP address of the computers corresponded to the employee IDs, I began by correlating the internet traffic with the patterns of user movements into the classified space. This process was an automated process, simulating software monitoring agents.

I discovered that in the month of January, there were seventeen instances where an employee's computer had internet traffic while that employee was in the classified space. Furthermore, most of these instances, which I deemed suspicious, all had a common destination IP, Each transaction to that IP address also had a large request size, typically 6-10 megabytes, and used the same port, 8080. These requests were much larger than the typical transactions. Many of the transactions also occured late in the day. A few occurred in the early morning.

Figure 1: IP traffic, showing volume vs. time (in green). Suspicious activity appears in the second row (in red). The times when each suspected employee entered or left the classified space is shown in blue.

The first suspicious activity on January 4th, was an exception to this pattern. On that day, someone used a single computer in a relatively short span of time while its normal operator (employee 38) was in the classified area. This traffic might be an anomaly, perhaps an automated process. It could also be when the suspect attempted to contact his handlers for information or instructions.

By cross-referencing all suspicious activity with the locations of the users, I determined that six employees (IDs 19,21,27, 30, 32, and 48) were never in the classified area when the suspicious activity occurred. If we assume that only one employee was guilty of espionage, these employees became the prime suspects.

To examine the suspicious behavior in another way, I looked at the location of the various desks where the suspicious activity occurred to determine whether a spatial pattern might help to identify the culprit. Figure 2 shows the part of the tool which indicates the desks where suspicious activity occurred, overlaid with the image of the embassy office space. Most of the suspicious accesses occurred near the center of the office. While this information failed to identify the culprit, it suggested someone who used an office located in the center of the room.

Figure 2: Display of the offices, highlighting where suspicious IP traffic originated (in red). The grid display corresponds to the embassy office plan, which is overlaid for clarity.

Following that, I examined other IP traffic with the same IP destination. I discovered several additional records, each with a large request size and the same port. The building access records would not include when an employee left the building, so these records could also indicate suspicious activity. Employees 19, 21, 32 and 48 were in the classified space when some of these IP transactions occurred, leaving 27 and 30 as the primary suspects.

At this point, I examined the daily records in closer detail using the tool's Daily View, as seen in Figure 4. This view shows the daily activity of the selected IP traffic (top, in green and red) matched against a specific user's activity for that day. In this display, the user's assumed time at work is shown in cyan, while their time in the classified space is shown in blue. I matched the larger set of suspicious traffic with each user's patterns of entry and exit and the time they spent in the classified space. On January 31, I discovered one suspicious transaction, at 9:41am. However, employee 27 did not enter the building until after 10am. Therefore, employee 30 is our likely suspect.

Figure 3: Activity on Jan. 31, showing that employee 27 had not reported to work when the first of three suspicious transactions occurred on that day. Employee 30 had been present and out of the classified area.

Looking at the first time that data was transmitted to the suspicious destination, we note that it originated from the computer of employee 31 late in the day (at 5:01pm). Since employee 31 shares the office with employee 30, this further corroborates our theory.

Figure 4: First week of suspicious activity, Jan 7-11. The IP data for our suspicious destination is shown along with the traffic we initially identified (in red) as suspicious. The classified access times for employees 27, 30, and 31 are also shown (in blue). Details of the suspicious transaction on Jan 8 are shown in the detail window.

Summary of the suspected behavior:

Our suspect, employee 30, may have initiated contact on Jan 4th, using employee 38's computer. Starting on Jan. 8th, he began transmitting large data files to a particular IP destination, He typically waited until a fellow employee entered the classified area, then used their computer to upload the information. Alternately, he accessed their computer either before they arrived at work or when they were out of the office for other reasons such as meetings or lunch. The IP traffic shows that each transaction exceeded 6 megabytes, indicating that the suspect was uploading large documents to his contacts.

Web Accessibility