Watch Video on YouTube
Center for Global Cyber Strategy (CGCS) researchers have used the data donated by the white hat groups to create anonymized profiles of the groups. One such profile has been identified by CGCS socio-psychologists as most likely to resemble the structure of the group who accidentally caused this internet outage. You have been asked to examine CGCS records and identify those groups who most closely resemble the identified profile.
Please limit your answer to seven images and 500 words.
The template and candidate graphs were visualized to support initial comparison across their topology, composition, and temporal patterns. The highly connected nature of the financial segments of the network made it difficult to see some structures and so were visualized separately.
From visual inspection of the template graph several noteworthy features were identified:
With these features in mind the composition of the candidate graphs and their topologies were visualized to facilitate searching for some of these same features.
A visual comparison of the topological structure shows the most similarly between the template and candidate graphs 1, 2, and 3. Candidate graph 4 is missing a purchase between two individuals like the one present in the template. Candidate graph 5 is missing the cluster of non-traveling communicators.
Comparing the node and edge composition of the networks does not reveal much more that the network structure does not already, but the temporal comparison does provide more helpful differences. In particular the communication peaks found in the template are also found in candidate graph 1 and, to a lesser extent, candidate graph 2. The summertime window of purchasing is found in the template and candidate graphs 1 and 2 and also in a shorter window in candidate graph 3.
From this initial comparison candidate graphs 4 and 5 seem to be quite different from the template. Candidate graph 3 also has several differences in the finance related areas of the network. Candidate graphs 1 and 2 both seem reasonably similar to the temporal patterns of the template with candidate graph 1 appearing to align the best after only a visual assessment.
Please limit your answer to five images and 300 words.
Many of the visualized features of the graph helped narrow focus to candidate graphs 1 and 2. By establishing topologically relevant subgraph structures and attributes we were able to check for pattern matches in the template and candidate graphs for a more analytical approach to evaluating similarity. Searching for patterns in the full graph as well provides context for their potential discriminatory power.
|Pattern Description||Template||Candidate 1||Candidate 2||Full Graph|
|Pairs of co-authors||0||0||0||1,000,000+|
|Multiple purchases in a single day||1||1||0||3,812|
|Pairs who engage in buy/sell activities at least 5 times||1||1||1||36|
|Pairs who travel to the same country with at least 1 day overlap||14||13||5||1,000,000+|
|People with more than $200k in personal income before taxes||2||6||0||2,262|
|People with more than $100k personal income before taxes||4||8||1||8,482|
|Groups of three that communicate at least 5 times each||30||4||11||8,299|
|Unique individual in a group of three communicating at least 5 times||16||7||12||1,426|
|Groups of three that communicate at least 10 times each||7||2||5||286|
|Unique individual in a group of three communicating at least 10 times||9||6||6||167|
|A buy/sell and communication happens within the span of a day||1||0||1||368,511|
|Pair that sequentially communicate, buy/sell, then communicate within the span of a week||1||0||1||43|
|Pair that travels with at least 1 day overlap and communicate with each other||3||1||3||738,393|
|People that communicate during travel||0||0||0||144|
|People who communicate with 2 others, travel, and buy/sell||1||0||1||1,885|
A few of the finance and travel related patterns showed a higher incidence of matches between structures in the template graph and candidate graph 1. But many of the more communication centric patterns showed a higher number of matches between the template and candidate graph 2. Given that the communication connections are more informative about the how an event between people occurred we are inclined to bias toward candidate 2 being a stronger match.
Additionally two people communicating, buying/selling, and communicating within a week is a relatively rare pattern in the full graph that is only shared between the template and candidate graph 2.
Describe your process and findings in no more than ten images and 500 words.
Our method for locating potential template matches is based on searching for the rarest of our defined patterns in the full graph and investigating the nodes that match those patterns and are found to be in close proximity to each other.
Our search technique involved looking for isomorphic subgraphs with fuzzy conditions on attributes such as time or weight.
The result is a count of how many times a matching subgraph was found in both the template and the full graph as well as the ids of all of the nodes that were involved in a match to that pattern.
|Pattern Description||Template||Full Graph|
|Pairs of co-authors||0||1,000,000+|
|Multiple purchases in a single day||1||3,812|
|Pairs who engage in buy/sell activities at least 5 times||1||36|
|Pairs who travel to the same country with at least 1 day overlap||14||1,000,000+|
|People with more than $200k in personal income before taxes||2||2,262|
|People with more than $100k personal income before taxes||4||8,482|
|Groups of three that communicate at least 5 times each||30||8,299|
|Unique individual in a group of three communicating at least 5 times||16||1,426|
|Groups of three that communicate at least 10 times each||7||286|
|Unique individual in a group of three communicating at least 10 times||9||167|
|A buy/sell and communication happens within the span of a day||1||368,511|
|Pair that sequentially communicate, buy/sell, then communicate within the span of a week||1||43|
|Pair that travels with at least 1 day overlap and communicate with each other||3||738,393|
|People that communicate during travel||0||144|
|People who communicate with 2 others, travel, and buy/sell||1||1,885|
By checking for the presence of multiple of these rare patterns the number of matches to evaluate is reduced to a number that is reasonable to visualize and analyze.
Each pattern match was analyzed individually to see the subgraphs formed by only keeping the nodes matching the pattern and any of the edges between them. They are shown below in order of decreasing rarity. Additionally any seed nodes that were directly connected to a node that matched one of the patterns are included to see which of the seeds could be involved in a template match.
By filtering the nodes to only the set that were found to match all three patterns a much smaller set of nodes is identified. And from those nodes we found that 561428, 620791,and 462278 have direct connections to the seed nodes 574136 and 600971. This leads us to believe that those seed nodes are very likely to be connected to subgraphs that match the template.
Describe your process and your findings in no more than ten images and 500 words.
To rule out the possibility we attempted to pattern match the topology of the template graph against the full graph without accounting for attributes such as timestamps or weights at all. Unsurprisingly there were no exact matches to the template topology in the full graph.
In testing for subgraphs adjacent to seeds for answer 2 we subset the matches in the graphs based on proximity to the seeds. However, as those patterns are able to detect the presence of template features generically we can identify more nodes that are likely members of subgraphs similar to the template using the same technique.
Additionally, we can identify roughly where in the full graph these pattern matches occur by first aggregating into communities. However, with this much aggregation there is not much that can be discerned from resultant structure.
As the patterns are all matched against people of interest the communities in which our matches were found are all highly connected, likely due to a large number of communication interconnections. It also is the case that many of the matched patterns span these communities so it is not accurate to assume for example that there may be four distinct template matches.
The more general testing for template patterns reveals that there are nodes that may be parts of subgraphs not connected to the seed. Attempting to expand the network too far beyond the nodes in a pattern can quickly result in a visualization of a significant portion of the full graph due to its interconnectedness. But the risk of not extending to reveal context is that parts of the template that were not directly searched for cannot be visualized.
|Pattern Description||Template||Full Graph|
|Pairs who engage in buy/sell activities at least 5 times||1||36|
Starting from the rarest pattern returns 36 matching nodes. We then added a select set of adjacent nodes with the hope that patterns similar to the template will be visually apparent. Extending one step to include travel, purchasing, and financial structures provides a graph for visual comparison against the template. It is likely that a subset of this expansion matches to one of the hacker groups being searched for.
Please limit your response to 5 images and 300 words.
To make our best guess at finding a match for the responsible party we started from the six people that were matches for all of the three rarest patterns we devised for identifying the presence of the template. When looking a their procurement channel events some of these nodes have similar temporal patterns over the course of the year, which further suggests they were involved in the same group activity.
By extending the subgraph to include local context including travel and purchase activity we were able to identify pairs that had purchases between them, similar to one of the distinctive patterns found the in template graph.
In particular 647740 is a potential match for template node 67 due to both being the member of the purchasing pair that does not have any travel activity. Additionally 570191 and 561428 are potential matches for template node 39, which is the member of the purchasing pair with travel activity. Between them, 561428 is connected to one of the seed nodes potentially indicating a higher chance of involvement.
Our greatest challenge was the difficulty in being able to query even a small region of interest to visually inspect the local structure and patterns. We overcame this by relying on pattern matching across a set of small, specific patterns to detect a region that matched multiple characteristics of the template.
Our pattern matching methods are able to do fuzzy matches on attributes of the nodes and edges, but currently cannot perform fuzzy matches on the topology. Having that ability would have made it easier to find matches using more complex patterns from the template. Matches on more complex patterns would have meant being able to more accurately extract the full subgraph that comprises a template match from the large graph.