Wandering Aengus



VAST 2009 Challenge

Challenge 1: Badge and Network Traffic



Authors and Affiliations:

David G. Robinson, Sandia National Laboratories, drobin@sandia.gov


Tool: FishFinder – shares a heritage with the GibbsLDA software. FishFinder is a new tool being developed for another research effort and I was curious if it could be used in this application.



MC1.1: Identify which computer(s) the employee most likely used to send information to his contact in a tab-delimited table which contains for each computer identified: when the information was sent, how much information was sent and where that information was sent. A total of 11 suspect static IP addresses were initially identified as having unique characteristics (see figure below) and this was narrowed down to three.  Only the top three are listed in the table:  Traffic.txt




*MC1.2:  Characterize the patterns of behavior of suspicious computer use.*


Time constraints limited the analysis and these results represent a first cut.  The IPLog3.5.csv data set was modified to simplify this initial exploration.  Specifically, the time of day was reduced to a 24 hour clock and calendar dates were changed to Day of Week. A variation of probabilistic latent semantic analysis was used to identify major patterns within the computer usage.  Time constraints prevented a full similarity analysis using, e.g., a Kullback-Leibler divergence measure. However, a quick look was accomplished using a variation of a Probability by Surprisal measure to compare the cluster distribution functions. The figure below presents the results of the Probability by Surprisal analysis used to identify the initial 11.  The final three were selected as having unique Destination IP addresses.  





Web Accessibility