Ma qi, 360 Enterprise Security Corp, heymarch@qq.com PRIMARY
Wei Xueshi, 360 Enterprise Security Corp, xs.wei@foxmail.com
Li Yiping, 360 Enterprise Security Corp, 835276214@qq.com
Huang Chuanming, 360 Enterprise Security Corp, josjoy0413@gmail.com
Liwenhan Xie, Peking University, xieliwenhan@pku.edu.cn
Zhiyi Yin, Peking University, 1600017832@pku.edu.cn
Xiaoru Yuan, Peking University, xiaoru.yuan@gmail.com
Student Team: NO
D3js
Visual
analytic system developed by our team.
Approximately how
many hours were spent working on this submission in total?
200 hours
May we post your
submission in the Visual Analytics Benchmark Repository after VAST Challenge
2018 is complete? YES
Video
Questions
1. Using the four large Kasios International data sets, combine the
different sources to create a single picture of the company. Characterize
changes in the company over time. According to the company�s communications and
purchase habits, is the company growing?
Limit your responses
to 5 images and 500 words
1.1 Overall personnel picture
To get a brief understanding of the company components, we make the eight parallel coordinate graphs of people with different degree distribution. For example, the subgraph Ci in Fig.1-1. illustrates the degree distribution of people having answered more than one calls. By looking into the graphs, we deduce that people involved in email and call records are roughly the same, and that particular patterns lay in both purchase and meeting behaviors, where the record amount is small. Sellers maintains little communication and sell records except a prominent outlier. And most people involved in meetings lack other type of connections.
Fig.1-1: Parallel Coordinate Graph for Comparison between Different Employees. C/ E/ P/ M stands for calls/ emails/ purchases/ meetings relatively. �i� & �o� represent in degree and out degree. For instance, Ci means each item in the parallel coordinate corresponds to a people whose in degree of calls are above zero
We further labelled people in the dataset by the communication (including calls and emails), meeting and purchase records they engaged in, and found that most people have clear division of labour in the company, whose details could be summarized as below.
Fig.1-2: Employee Component. Light colors represent the proportion of people keeping this kind of records only.
Thess pie charts refer to the five major roles in the company, i.e. liaison men, buyer, seller, meeting initiator and attendee, where light colors stand for those who only hold this kind of records. About 2/3 of the staff take charge of online communication (calls and emails), and about half of them also make purchase. The remaining 1/3 include sellers, meeting initiators, and meeting participants. It�s interesting that both meeting initiators and participants are mutually exclusive
Fig. 1-3: Pattern of active people, new staffs and staffs about to quit every month
The chart above illustrates entering and quitting patterns in every month. �Active people monthly� shows numbers of active people every month and their structure: the company enrolled about 15000-20000 new employees every month, and they seldom quit before month 30. Therefore, it�s reasonable to say that the company is expanding during the observed time. �New staff structure� answers the question: �Who did they recruit every month? � Based on the class mentioned above, we classified those new staff. There are 4 main classes: �meet_out�, �meet_in�, �communication�, and �communication+buy_in�. People from those classes increase except the last one.
1.2 Overall business picture
Fig. 1-4: a line chart showing changes of four businesses over time
Numbers of calls, emails and purchases kept steady in the 2 and half years. Given more and more staff engaged in company business along time, their works were becoming less intensive. Only meeting records increased in time, which might illustrates the growing demands for meetings while the company enlarged its scale.
1.3 Communication & purchase habits
Fig. 1-5: The stacked bar charts above show personnel constructions for 4 different activities every month. Personnel are divided by the month they entered the company (i.e. time of the first record). The top bar of each month stands for newly enrolled people, while the bottom bar stands for the earliest employees.
Online communication records have low staff mobility, as most new employees engaged in business in the successive months. In contrast, meeting records show high mobility. Among newly enrolled people in a month, only a few would keep participating in meetings. To explain this, we infer that the company might extend business mainly by meetings.
2.
Combine the four data
sources for group that the insider has identified as being suspicious and
locate the group in the larger dataset. Determine if anyone else appears to be
closely associated with this group. Highlight which employees are making
suspicious purchases, according to the insider�s data.
Limit your responses
to 8 images and 500 words.
To accomplish the goal of detailed investigation of small subgraph in a large dynamic network, we developed a system called Traceability Analysis System. It takes each employee as a node, and the records provided by the insider as instances of links between each nodes. With this system, we are able to explore the company with an initial point. So we started with the unique suspicious purchase record. Then we explore the whole suspicious group step by step. On one hand, when a new node is added to the system panel, the interactions between new node and exiting nodes would be shown. On the other hand, the system provides a configure panel that enable selections based on multiple rules, e.g. link count threshold, common neighbor, relation type, etc.. Therefore, explicit filter for nodes that are closely associated with the target node or groups could be done with a simple click on a button. As a result, we found some employees closely associated with the given suspicious group as the following table.
Note: The time shown in the system is eight hours ahead of the actual time.
As Fig. 2-1 suggests, we followed four steps.
(1) Retrieve nodes (green) that link to the suspicious group for many times.
(2) Select nodes (purple) that associate with more than one nodes in V.
(3) Add both the potential employees above and the suspicious group into the timeline analysis panel, to check whether their connecting points are close.
(4) Inspect the statistic information of each potential employees, to see the proportion of links between itself and the suspicious group.
Fig. 2-1: Analysis Steps
Then we obtained nodes that highly closed to the suspicious group.
Fig. 2-2: The group that associate with the suspicious group closely
ID | Name | Discription | Closeness |
786361 |
Sheilah Stachniw |
connected with 8 targets, many records to V and mostly to V, and 1 suspicious purchase |
**** |
981554 |
Sherrell Biebel |
connected with 8 targets, many records to V and mostly to V |
*** |
944354 |
Ferne Hards |
connected with 5 targets, many records to V and mostly to V |
*** |
2037156 |
Martha Harris |
connected with 5 targets, many records to V and mostly to V |
*** |
175354 |
Madeline Nindorf |
connected with 4 targets, mostly link to V |
*** |
1376868 |
Timothy Gibson |
connected with 4 targets, and 6 suspicious purchases |
** |
713701 |
Jane Tyler |
connected with 3 targets, and 2 suspicious purchases |
** |
713639 |
Juan Walsh |
connected with 3 targets |
* |
713743 |
Sherilyn Coopwood |
connected with 3 targets |
* |
713814 |
Jaunita Westen |
connected with 3 targets |
* |
1981017 |
Terrilyn Overkamp |
connected with 3 targets |
* |
Table 2-1: People that associate with the suspicious group closely.
To find out suspicious purchase records, we added relevant people into the timeline panel for further investigation. Combining their activities and statistic information, we�ve discerned the exceptional purchase records.
Fig. 2-3: All activities of the suspicious group and all purchases of the suspicious group.
Sheilah Stachniw (786361) only associated with people in the suspicious group and one other supplier. In a crucial point when suspicious group were communicating with each others, she made two purchases.
Fig. 2-4: Sheilah Stachniw (786361)
Timothy Gibson (1376868) holds most suspicious purchase records. First, he had a call at 0:21 a.m. on 20th June, 2015 with Richard Fox (857138), then he bought things from Gail Feindt six minutes later. Second, he sent an email to Meryl Pastuch (1690582) at 6:45 p.m. on 10th September, 2015. In half an hour after the email, he keep buying things from Gail Feindt twice. Moreover, he sent an email to Tobi Gatlin (969089) at 8:20 p.m. on 20th January, 2017 with a purchase record to Gail Feindt around half an hour ago. Last, he made a call with Lindsy Henion (1108217) at 6:22 a.m. 8th December, 2017 and bought things twice in the next two hours.
Fig. 2-5: Timothy Gibson (1376868)
Jane Tyler (713701) emailed to Richard Fox (857138) at 9:44 p.m. on 24th August, 2015 and bought things from Gail Feindt immediately. Similar behavior happened at 8:18 a.m. on 6th July, 2017, when he called to Tobi Gatlin (969089) and then bought things from Gail Feindt.
Fig. 2-6: Jane Tyler (713701)
Source | Target | Time | Event |
Sheilah Stachniw |
Gail Feindt |
2017-12-04 13:05:50 |
in the crucial period |
Sheilah Stachniw |
Gail Feindt |
2017-12-06 03:53:23 |
in the crucial period |
Timonthy Gibson |
Gail Feindt |
2015-06-20 00:26:52 |
immediately after a call |
Timonthy Gibson |
Gail Feindt |
2015-09-10 18:53:11 |
after an email |
Timonthy Gibson |
Gail Feindt |
2015-09-19 19:22:28 |
after an email |
Timonthy Gibson |
Gail Feindt |
2017-01-20 20:04:58 |
half an hour before an email |
Timonthy Gibson |
Gail Feindt |
2017-12-08 07:51:43 |
after a call |
Timonthy Gibson |
Gail Feindt |
2017-12-08 08:29:50 |
after a call |
Jane Tylor |
Gail Feindt |
2015-08-24 21:45:27 |
immediately after an email |
Jane Tylor |
Gail Feindt |
2017-07-06 08:32:15 |
after a call |
Table 2-2: Suspicious purchase records.
3.
Using the combined
group of suspected bad actors you created in question 2, show the interactions
within the group over time.
a. Characterize the group�s organizational
structure and show a full picture of communications within the group.
b. Does the group composition change during
the course of their activities?
c. How do the group�s interactions change over
time?
Limit
your responses to 10 images and 1000 words
a. Organizational structure
Fig. 3-1: The whole picture of all suspects and people who connect closely with them
In Fig. 3-1, we labeled them with their location inside or outside the group. Orange label stands for known suspect group. Pink stands for people who connect with more than one person in the suspect group. Green stands for people who connect with only one person in the group more than once. The only blue one is the biggest supply of goods.
We can generally find out the interaction pattern with the known group and other suspects we found:
1. Several people (857138, 1690582, 1108217) have crowded edges connected with other suspects, and we call them organizers inside the group.
2. Other people in the known group mainly communicate with several certain points outside the group (981554, 175354), and we call them organizers outside the group.
Then, we downsize the group to a core group, and generate a layout based on their status in the group (Fig. 3-2).
Fig. 3-2: Group status
Pink stands for outside organizers. Orange stands for inside organizers. Grey stands for people doing communications and purchases, which is the prevailing class in the dataset. Green stands for people doing communications and initiating meetings. Yellow stands for pure liaisers.
In this layout, we noticed that those organizers are in unique classes (e.g. 857138 is the only one in the dataset having communication, meet initiating, and purchase records). Their suspicious records confirm that they are key members of the whole suspect group. Specially, they have some strange meet records with other group members. For another example, 981554 is the closest person with the known group.
b. Composition Change
To illustrate the inner composition change in this section, we generate a concise layout including the original suspicious group and several important extensions.
Color annotation for part b&c:
Orange: Original suspicious group
Pink: Closely connected suspect with purchase records
Blue: Closely connected suspect without purchase records
Green: Closely connected suspect with far more total records than other people in the graph
Fig. 3-3: Phase 1: May - November 2015
In this phase, the suspicious group didn�t act much. In our concise version of expanded group shown in Fig. 3-3, only inside organizers (857138 and 1690582) has connections with Green nodes. We consider this as an incubation period before the group really established, so other subordinate personnel didn�t appear in this period.
Fig. 3-4: Phase 2: November 2015 - January 2016
In this phase, nearly all members in the original group participated in various activities. It is worth noting that the blue node is very closely related to the team members during this time. We can infer that these �outsiders� engaged in group activity at a very early time.
Fig. 3-5: Phase 3: February 2016 - June 2017
During this relatively long period of time, the interaction between the members of the extended team is relatively sparse, and the main interaction takes place in the main members of the group. However, this does not lead us to conclude that some members have withdrawn from the group. Basically, everyone is still in contact, but the frequency is reduced.
As this is a long time period, we extract three successive months to maintain uniform variables with other phases.
Fig. 3-6: Phase 4: July - December 2017
At the end of the stage, the core members of the group (the orange nodes near the center in the picture) were once again collectively dispatched for more contact. This phase is different from the first intensive contact phase for 2 reasons:
1. Peripheral orange nodes did not participate extensively in interactions.
2. Blue nodes seldom participate in interactions.
In conclusion, after the first activity peak (Phase 2), the composition of the group didn�t change a lot. Most of them would not disappear for a long time. They just engaged in activities in various frequencies.
c. Interaction Change
To observe the specific interaction details between the members, it is necessary to simplify the number of nodes in the graph as much as possible. For this reason, we divide the expanded group into three parts, shown in Fig. 3-7 :
1. Outward: People who have much more records than the others, undertaking the task of communicating outward the group.
2. Inward: People who don’t have many records. They mainly connect with other group members.
3. Supplier: i.e. 2038003.
Fig. 3-7: Overview of interactions inside the expanded suspicious group over time
Then, we import records only within Inward group and also within the whole expanded group to compare, shown in Fig. 3-8.
1. The beginning of the two concentrated events was a multi-person meeting (light green). This shows that the meeting is the beginning of a suspicious activity arranged by this group. However, people attending the two meetings was significantly different, perhaps representing the difference in the purpose of the two events.
Fig. 3-8 (a) Records with in Inward group (b) Records of Inward group in the whole expanded group
From the two timelines in Fig. 3-8, we can clearly see the peak of the two activities of the suspect group: the first one was from Nov. 2015 to Jan.2016, and the second one was in Sep. 2017. Looking at the peak of these two activities in a macro view, we can get the following behavior patterns:
Fig. 3-9: Overview of interactions for all 4 types
2. In the short time after the first meeting, the calls occupied the majority of the communication methods within the group, and after the first peak of events, the main communication method became email.
Fig. 3-10: Interactions of the second peak
3. Fig. 3-10 shows the interactions of the second peak in detail. The connection between Rosalia Larroque (1847246) and Kerstin Beveal (728286) is the main theme. It should be noted that Kerstin Beveal was very active in the later period. He also had five consecutive conversations with Sherrell Biebel (981554) in mid-May 2017 and participated in the second multi-person conference, which was a key object of doubt. Rosalia Larroque first made a call to Jenice Savaria (2038003), then made a purchase. In the successive month, Rosalia maintained close contact with Kerstin.
4. Compared with the first time, people from Inward group have many connections with outside the group abnormally. Furthermore, most of those connections are in email. Specially on December 4th, 2017, the team members interacted with the outside world on a large scale, and at the same time heralded the end of the group's activities.
4.
The
insider has provided a list of purchases that might indicate illicit activity
elsewhere in the company. Using the structure of the first group noted by the
insider as a model can you find any other instances of suspicious activities in
the company? Are there other groups that have structure and activity similar to
this one? Who are they? Each of the suspicious purchases could be a starting
point for your search. Provide examples of up to two other groups you find that
appear suspicious and compare their structure with the structure of the first
group. The structures should be presented as temporal not just structural
(i.e., the sequence of events�A is followed by B one or two days later�will be
important).
Limit
your responses to 10 images and 1200 words
In summary, we get the following knowledges from the above questions and we utilize them to solve this question.
1. For organizational structure, there are core members in the suspiciou group, whose distances to group members ranged from 1 to 2, and mostly 1. Group members associate with each others for many times. Above all, group could be divided into three parts, i.e. bargain suppliers, the outwards and the inwards (see Figure 3-7).
a. Bargain suppliers (the blue node) is the destination of purchases.
b. Outward members (yellow and green nodes) have purchase records and many other records associating with people outside the group (the green nodes represent people with higher proportion of outside associations).
c. Inward members mostly communicate within the group and have no purchase records. Besides, they have less records than the outwards.
For temporal structure, there will be sudden large assosications between the outwards and the inwards. Especially when there’s a purchase, meetings and frequent communications would come together.
Therefore, our set up our strategy for finding similar groups U to the first group V.
1. Start from a small number of nodes S.
2. Select node N that link tightly to S with some criteria (weighted by orders) and add them into S iteratively.
a. large amount of repetitive edges;
b. multiple association targets in S;
c. records between N and S takes up more than half of records of N;
d. sudden association could be observed in the timeline;
e. preferably no purchase records.
The extra six purchase records involve four people in all. So we use these four people as our start points. Thus we find out four suspicious groups as Fig. 4-1 ~ Fig. 4-4 and Table.
ID | name |
320914 |
Cora Cross |
1152569 |
Donnetta Lapoint |
1981017 |
Terrilyn Overkamp |
1141575 |
Lucy Herrera |
1271503 |
Jerome Jordan |
580766 |
Trevor Webb |
1476791 |
Alesha Aschenbrenner |
726693 |
Archie Griffies |
2038138 |
Tyree Barreneche |
ID | name |
2037766 |
Gregory Russell |
437025 |
Beth Wilensky |
1172172 |
Angelic Graetz |
758683 |
Anjelica Hoger |
160091 |
Cora Gonzalez |
981745 |
Prudence Rosol |
32081 |
Edgar McCormick |
447025 |
Renae Hilbrand |
468954 |
Abbey Rhead |
1371959 |
Zachary Hampton |
1590376 |
Olivia Brown |
1710613 |
Amelia Colon |
1771151 |
Karyl Snobeck |
103229 |
Sherrl Brensnan |
224204 |
Indira Fugua |
369655 |
Layla Mostad |
709913 |
Birdie Pioch |
717869 |
Nikia Wilebski |
723897 |
Cecilia Pichette |
755431 |
Ollie Andrews |
1422630 |
Amada Faul |
1424784 |
Virgie Pratt |
1446867 |
Fonda Bursch |
1499804 |
Katharine Santos |
1505718 |
Valentine Klette |
1608171 |
Ilona Barros |
1714881 |
Concha Goodall |
1938283 |
Lupe Gullatt |
ID | name |
2037860 |
Carlos Morris |
695013 |
Laure Pelkley |
312722 |
Courtney Wiedemann |
1147062 |
Yong Wilbert |
221502 |
Arthur Fox |
1066593 |
Marjorie Halbach |
1564399 |
Daria Housten |
1880061 |
Renae Hilbrand |
1880061 |
Merlene Tessier |
9391 |
Roger Beck |
170126 |
Virginia Buchanan |
200423 |
Sharmaine Lofredo |
206602 |
Rena Jerabek |
1200868 |
Jayden Walters |
1266039 |
Dorothea Kulback |
1376314 |
Marc Bowen |
1572181 |
Alan Sedotal |
114786 |
Elva Ingram |
250565 |
Jessica Pokoj |
350756 |
Omar Tako |
1229528 |
Regina Bordoy |
1740389 |
Sherlyn Delcine |
40236 |
Lulu Larson |
115539 |
Dina Fairy |
293560 |
Kathie Matheu |
1706578 |
Sha Pardoe |
1862079 |
Cathleen Kucinski |
Fig. 4-1: Group 1, found from Trevor Webb(580766)
Fig. 4-2: Group 2, found from Beth Wilensky (437025)
Fig. 4-3: Group 3, found from Laure Pelkley (695013)
Fig. 4-2
For the suspicious activities, an abnormal phenomenon happened in 4th December, 2017, when a large number of association between the inward nodes of each groups and people outside its group took place.
Fig. 4-5: Suspicious events happened around 4th December, 2017.
In the overview of the four groups, they are similar at the record distribution over times and also accord with the suspicious group. First, their purchase records take place and the end of the timeline. And there seems to be an invisible line in the middle of the timeline, where record amount vary a lot before and after that.
However, compared with the suspicious group as in Figure 3-7, there exist many differences. Group 1 has a smaller size, let alone fewer records. Group 2 has remarkable outward nodes, and hence its timeline view is more complicated relatively. Group 3 behave rather stable. As for Group 4, it is highly similar to the suspicious group both in organization structure and temporal structures.
Fig. 4-6: Comparison of the four groups..