Oculus Info Inc. – GeoTime
VAST 2009 Challenge
Challenge 1: - Badge and Network Traffic

Authors and Affiliations:

Adeel Khamisa (akhamisa@oculusinfo.com)
Lynn Chien (lchien@oculusinfo.com)
William Wright (wwright@oculusinfo.com)

Tool(s):

To solve this challenge, we used GeoTime v4.01, released in 2009 by Oculus Info Inc. GeoTime supports the visualization and analysis of entities and events over time and geography. Events are represented within an X,Y,T coordinate space, in which the X,Y plane shows geographic space and the vertical axis represents time. Entity movements, events, relationships, and interactions over time within a spatial context can be easily seen and understood. Events animate through this 3-D space as time is played through. The new GeoTime includes search, link analysis, automatic pattern detection, story annotation and other analytical functions. [See Kapler, T and Wright, W. GeoTime Information Visualization, IEEE Symposium on Information Visualization 2004.]

We also used Excel and Mondrian v1.0, an open-source tool released in 2008 by Martin Theus. Mondrian is a 2-D interactive data-visualization system. Current plots include Mosaic Plot, Scatterplots and SPLOM, Maps, Barcharts, Histograms, Missing Value Plot, Parallel Coordinates/Boxplots and Boxplots y by x.

Video:

Challenge 1 Video

ANSWERS:


MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent. 

Traffic.txt


MC1.2: Characterize the patterns of behavior of suspicious computer use.

To begin, our GeoTime analyst loaded the employee (or user) traffic data in Excel and made columns for latitude longitude coordinates based on the embassy image provided in order to spatially represent employees sending data from their office, entry into the building, and movement in and out of the classified section. The data was also loaded into Mondrian and plots were made of counts by sockets used, IP traffic per day, traffic from destination IP and source IP. A weighted histogram that displays count of request sizes was also created.  The Mondrian analyst began to test the hypothesis that an employee ex-filtrating data would send out large payloads to a limited set of external IPs, by correlating request size with external IPs. In the request size plot, a range of the largest request sizes were selected.  In the source IP plot, selection highlights showed user IPs for 16, 20, and 31 had the selected larger payloads. These IDs were then given to the GeoTime analyst.

 

In two minutes, the GeoTime analyst first selected only the records associated with the three suspect IDs in Excel and then sent those activity records to GeoTime.  In GeoTime, IP traffic was mapped as communication events between destination and source IPs.  Proximity card data was mapped as movement events within the embassy.  As seen in Figure 1 the request size was mapped to event size.  In 1 minute, analysts detected that Destination IP 100.59.151.133 had an unusual number of large request sizes.  Analysts made this observation by sorting events by request size and selecting a range of the largest sizes. The selected large request sizes were cross-referenced to the corresponding destination IPs.  Then to confirm, the GeoTime analyst did the reverse.  Clicking on that Destination IP in the Charts Tab, to see events in the Space-Time Viewer that were associated with the IP, all the large events in the Viewer and Charts Panel were highlighted.

 

Figure 1 again

Figure 1:  All events over time, by location, for the top three employees with the largest request size events highlighted.  In the charts tab on the right, destination IPs are listed.  The highlighted large request sizes all go to the only highlighted destination IP.

 

 

Meanwhile, the Mondrian analyst was able to see that both external destination IP 37.170.30.250 and 100.59.151.133 were highlighted, as seen in Figure 2, when the highest request loads were selected.  However, looking at source IPs, employees sending the data were varied.  Also, when selecting destination IP 37.170.30.250, in the linked bar charts, socket 25 was highlighted in the socket bar chart, indicating that this IP had been accessed exclusively through socket 25.  Given that a large number of employees communicated with this IP and that socket 25 is usually used for SMTP E-mail traffic, it is likely that IP address 37.170.30.250 corresponds to an email server and is expected to receive a lot of traffic and interactions with large request sizes.

 

Conversely, when selecting destination IP 100.59.151.133, only socket 8080 was highlighted. Given that the default socket for http traffic is generally 80 while 8080 is an alternate and less common socket, exclusive usage of 8080 to access this IP is suspicious. In addition, other IP addresses that have been accessed through socket 8080 were also accessed through socket 80. As IP 100.59.151.133 is the only IP address that was exclusively accessed through socket 8080, suspicion regarding the activities to this IP is further strengthened.

 

The Mondrian Analyst made a list of all employee IDs that contacted 100.59.151.133 and provided the list to the GeoTime analyst. The remainder of the analysis was performed in GeoTime by both analysts.

 

Figure 2: Histogram showing number of requests at each request size interval and corresponding Destination IP traffic.  

Selecting large request sizes revealed that Destination IP 100.59.151.133 has the greatest number of large request sizes.

 

 

From Excel, the analysts sent the computer activities and proxy movements associated with the employees who communicated with the suspicious IP to GeoTime. Each employee’s computer activity was cross-referenced with his/her movements in and out of the building and classified area. For example, Figure 3 shows a communication event between IP 37.170.100.31 represented by the large purple circle and IP 100.59.151.133, the large blue circle. The communication event occurs while employee 31, the smaller purple circle and purple track line, is in the classified section. The duration of 31's stay in the classified section is represented by vertical purple track, and his exit is represented by the horizontal purple track. This series of movements and communication forms a pattern that indicates absence during the critical communications event.  Sending the proxy movements for the rest of the 59 employees to GeoTime, the analysts manually searched for this pattern, as this pattern is not currently one of the GeoTime Pattern Discovery features. It took the GeoTime analyst two days to find that employee 45 was the only person who was always in the building and not in the classified area every time data is sent to the suspicious IP, though he/she never sent information to that IP him/herself. We noted this new pattern and will be adding it to the automated patterns in the Patterns Discovery feature of GeoTime.

 

Figure 3: Calendar view showing Employee 31’s activities in time and office-space.

Focusing on the time when communication was sent to the suspicious IP revealed that Employee 31’s computer was compromised while he was in the classified area.

 

 

The GeoTime analyst began looking for other relationship and temporal patterns by clicking on the suspicious destination IP in the Space-Time Viewer in GeoTime and running a Link Analysis.  Links for first and second degree connections to the IP were immediately shown.  By clicking on the First Degree link, analysts could see which entities sent requests to the IP, as well as the requests that were sent (in total, 18 events), as shown in Figure 4. By charting the events based on day of the week, it could be seen that the information was always sent out on Tuesdays (8 communication events) and Thursdays (10 communication events). By charting based on the Entities, it could be seen that no computers were used more than 3 times to communicate with the IP. Charting by the hour revealed that that they were sent between 8am to 6pm.

 

 

Figure 4: Looking for patterns using Link Analysis function and Charts.

Running a Link Analysis shows all computers that have sent information to the suspicious destination IP.  Charting by Day of Week revealed that Tuesdays and Thursdays are the only days information is sent.

 

 

 

By switching to the 2D spatial view, analysts could see that the employee did not limit him/herself to a specific computer, and spread the 18 ex-filtration communications around the offices in the embassy. And by switching to the Calendar view, the GeoTime analyst could see the following:

·         ID 45 holds very consistent hours, between 8am and 6pm, consistent with the communications that are being sent to the IP, which are generally sent either early in the morning or in the afternoon and never between 10am and noon.

·         The employees of the compromised computers are generally active (in the office) when the data is being sent out so ID 45 must sneak quickly into the offices when the employees are on a break, or in the classified office. Very rarely are the computers compromised before the employees enter into work in the morning.

 

Figure 5: Looking for pattern from different views.

The 2D spatial view emphasizes spatial analysis, which revealed that 12 computers throughout the embassy were compromised. The Calendar view shows a detailed time analysis, revealing that the target waits for the victim to be in the classified area, or on break from their normal email patterns, before the information is sent.  Perhaps there is a regular large office meeting on Tuesdays and Thursdays that many people must attend.

 

 

 

Web Accessibility