Team GAMI: BNT Tool to visually identify suspicious events

VAST 2009 Challenge
Challenge 1: - Badge and Network Traffic

Authors and Affiliations:

      Soujanya Vadapalli, International Institute of Information Technology, Hyderabad [PRIMARY contact]
      Shraddha Agrawal, International Institute of Information Technology, Hyderabad
      Sravanthi Kollukuduru, International Institute of Information Technology, Hyderabad
      Kamalakar Karlapalem, International Institute of Information Technology, Hyderabad [Faculty advisor]


We built the Badge and Network Traffic (BNT) tool to create animations of the events taking place in the embassy. Using the embassy layout, time-stamps, the prox-card and web-access entries, we animated color-based flagging of events. From the employee information table, we obtained the office location of each employee in the embassy building layout. Each block in the layout is associated with the corresponding employee.


Colors of a block and the associated events are given below:

1. WHITE (default) - No prox-in-building entry occurred till that point of time.

2. GREEN - Prox-in-building entry occurred and block colored green from then.

3. BLUE - Prox-in-classified entry occurred and block remains blue till prox-out-classified occurs.

4. 'W' - A web-access from the corresponding source ip, the block is highlighted with a 'W' written.

5. RED - A web-access occurs from corresponding source ip and the block is then blue, the block is marked with red - indicating a suspicious event.


Other events:

1. If the prox-out-classified event occurs when block color is blue, the block's color is restored to green.

2. When a block is colored RED, a couple of plots displaying the ratio of reqSize to respSize of the source ip and the dest ip are displayed for further evaluation.


Each day's events are animated and are available for viewing at the web-url given below. There is also an animation to display only the suspicious events. BNT tool is developed using Python, PyX (graphics API) and Javascript

Developers: Shraddha Agrawal, Sravanthi Kollukuduru
Tool is available here:




Mini challenge 1: BNT tool




MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent.



MC1.2:  Characterize the patterns of behavior of suspicious computer use.

VAST Mini-challenge 1: Detailed answer

From the task description and data set provided, we identify each record (row) in the tables of prox-card logs and web access logs as an event. An event, thus, is one of the following: (a) prox-card event: an event that indicates either an entry into the embassy by an employee, an entry into the restricted zone or an exit from the restricted zone. (b) web-access event: an event that indicates a web access from a machine in the embassy (usually referred to as source ip) to a destination machine on the net (referred to as destination ip).

Given these event types and the logical constraints from the task description (i.e. an employee makes web-accesses from his alloted machine, whenever there's an entry into the restricted area: the events are always recorded and no piggy-backing is allowed here), we formulated a few logical inconsistencies that could appear in the data. Whenever an event takes place, we check if it leads to a logical inconsistency. If it does, we flag the event as suspicious and obtain other related information to that event to validate if it is indeed suspicious.

The logical constraints being:

1. An employee is in restricted area and there's a web-access from his alloted machine.

2. An employee does not have a prox-in-building event and there's a web-access from his alloted machine. This could also mean that this employee has piggy-backed, but we still flag such an event as a suspicious candidate for further evaluation.

3. For each web-access, if the ratio (reqSize /respSize) is high - we flag such events for further evaluation.

4. The usual time-slots during which the user accesses web are plotted - an unusual access is flagged suspicious.

Observations on data

1. There are 60 employees and 60 corresponding machines alloted to each employee.
2. Number of unique destination ips is 20243.
3. The various ports through which data transfer took place are three in number and they are 80, 25, 8080.

4. All web-accesses through port 25 are to only one destionation ip ''. As port 25 corresponds to simple mail transfer protocol, we conclude that this destination ip might be the embassy's mail server and these web-access requests made to this destination ip are not considered for suspicious events analysis.

5. Whenever the ratio of reqSize (request size) of the web-access to the respSize (response size) is high, we conclude that this web-access is typically a heavy data - transfer (upload) from the source ip to the destination ip.

GUI-assisted Data Analysis

1. Web-access events through ports 80 and 8080 are analyzed through plots.
2. The regular web-access patterns of each user on his respective machine are analyzed through plots.
3. Animation of events:

The prox-card entries and the web-access entries being temporal, we designed the layout of the building with the aid of the embassy layout image given and enabled color-based flagging based on events associated with the employees. From the employee information table, we obtained the office location of the employee in the embassy building layout. Single blocks are associated with the corresponding employees and are colored in white as default.

The colors and the associated events are mentioned in the tool description above. The animations are made for events day-wise and the list of suspicious events are compiled to create an animation in summarized fashion. The animations for all the 31 days are available online here:

Suspicious events

We identified 8 cases when a web-access is made from a source ip when the machine's corresponding employee is in the restricted area (there's a prox-in-classified entry, but not prox-out-classified entry as yet). We flag these events and observe the usual pattern of request size to response size ratio of this source ip. An unusually high value of this ratio indicates a heavy data transfer in the absence of the employee. All these 8 web-accesses are made to only one destination ip:, through port 8080.

We flagged this ip address as a possible source of information leak from the embassy and retrieved other web-accesses to this destination ip. There are 10 such web-accesses to this destination ip, though the prox-card entries are logically temporally consistent with the time-stamp of these web-accesses. So, for these web-accesses, we analyzed other related information closely: like the number of employees present in the embassy at that time, number of employees present in the classified area at that time (as shown in the table 1 below) and finally the regular web-usage patterns of each source ip from these events. From these web-usage patterns, we check if there is an unusual web-usage pattern; for instance, a rare web-access around time 17:00, when on most of the other days there is no web-activity around this time-frame or the time-stamps of the last web-access on each day is no more than 16:00 time, but only on one day there is a web-access only to this destination ip with a heavy data transfer around 18:00 to 19:00 hours time-frame.

A total of 18 web-accesses to this destination ip are found in the web log entries and all these accesses have an unusual pattern; we thus identify these 18 events as the suspicious events.

Table 1: List of suspicious events and some statistics*

For destination ip : and port :8080

SNo Source ip  Date Time  Ratio Why/How
1.  21/31/2008 9:41 251.9230144 WARA^
2.  1/17/2008 12:12 150.6416902 WARA, No prox-in entry for employee 40
3.  1/29/2008 16:08 116.6890521 WARA, Employee 40 is also in restricted area 
4.  1/31/2008 13:10 806.6132764 WARA 
5.  1/15/2008 16:14 274.6528527 WARA
6.  1/10/2008 16:01 693.8860461 WARA
7.  1/29/2008 15:41 339.075055 WARA, Employee 57 is also in restricted area 
8.  1/10/2008 14:27 293.2205243 WARA
9.  1/8/2008 17:01 727.291 35 employees had left after 17:00 (prox-out-classified)
10.  1/15/2008 17:03 664.1519 43 employees had left after 17:00 (based on prox-out-classified)
11.  1/17/2008 17:57 232.7596 38 employees had left after 17:00 (based on prox-out-classified)
12.  1/22/2008 8:50 236.4215 No prox-in building, has the first access to dest ip
 22 entered before 8:50am (prox-in-building) 
13.  1/22/2008 17:41 528.876 40 employees left after 17:00 (prox-out-classified) 
14.  1/24/2008 9:46 329.0354 41 employees had entered building before 9:46 am  
15.  1/24/2008 10:26 246.0819 -----None---- 
16.  1/24/2008 17:07 229.8254 39 employees left after 17:00 pm , 21 was in restricted area
17.  1/29/2008 16:38 142.2871 21 was in restricted area
18.  1/31/2008 16:02 28.1967 9 was in restricted area

^ WARA- Web access from source ip while corresponding employee in restricted area

* Also, all these entries are accompanied with the ratio plots of source ips and the web-usage patterns of the source ips - to evaluate visually. These plots could be checked at the BNT tool web-url mentioned above.

Web Accessibility