Entry Name:"360-PKU-Ma-MC3"

VAST Challenge 2018
Mini-Challenge 3

 

 

Team Members:

 

Ma qi, 360 Enterprise Security Corp, heymarch@qq.com PRIMARY

 

Wei Xueshi, 360 Enterprise Security Corp, xs.wei@foxmail.com

 

Li Yiping, 360 Enterprise Security Corp, 835276214@qq.com

 

Huang Chuanming, 360 Enterprise Security Corp, josjoy0413@gmail.com

 

Liwenhan Xie, Peking University, xieliwenhan@pku.edu.cn

 

Zhiyi Yin, Peking University, 1600017832@pku.edu.cn

 

Xiaoru Yuan, Peking University, xiaoru.yuan@gmail.com

 

Student Team: NO

 

Tools Used:

    D3js

    Visual analytic system developed by our team.

 

Approximately how many hours were spent working on this submission in total?

    200 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2018 is complete? YES

 

Video

    video

 

 

 

Questions

1.     Using the four large Kasios International data sets, combine the different sources to create a single picture of the company. Characterize changes in the company over time. According to the company�s communications and purchase habits, is the company growing?

Limit your responses to 5 images and 500 words

 

1.1 Overall personnel picture

To get a brief understanding of the company components, we make the eight parallel coordinate graphs of people with different degree distribution. For example, the subgraph Ci in Fig.1-1. illustrates the degree distribution of people having answered more than one calls. By looking into the graphs, we deduce that people involved in email and call records are roughly the same, and that particular patterns lay in both purchase and meeting behaviors, where the record amount is small. Sellers maintains little communication and sell records except a prominent outlier. And most people involved in meetings lack other type of connections.

 

Fig.1-1: Parallel Coordinate Graph for Comparison between Different Employees. C/ E/ P/ M stands for calls/ emails/ purchases/ meetings relatively. �i� & �o� represent in degree and out degree. For instance, Ci means each item in the parallel coordinate corresponds to a people whose in degree of calls are above zero

 

We further labelled people in the dataset by the communication (including calls and emails), meeting and purchase records they engaged in, and found that most people have clear division of labour in the company, whose details could be summarized as below.

 

Fig.1-2: Employee Component. Light colors represent the proportion of people keeping this kind of records only.

 

Thess pie charts refer to the five major roles in the company, i.e. liaison men, buyer, seller, meeting initiator and attendee, where light colors stand for those who only hold this kind of records. About 2/3 of the staff take charge of online communication (calls and emails), and about half of them also make purchase. The remaining 1/3 include sellers, meeting initiators, and meeting participants. It�s interesting that both meeting initiators and participants are mutually exclusive

 

Fig. 1-3: Pattern of active people, new staffs and staffs about to quit every month

 

The chart above illustrates entering and quitting patterns in every month. �Active people monthly� shows numbers of active people every month and their structure: the company enrolled about 15000-20000 new employees every month, and they seldom quit before month 30. Therefore, it�s reasonable to say that the company is expanding during the observed time. �New staff structure� answers the question: �Who did they recruit every month? � Based on the class mentioned above, we classified those new staff. There are 4 main classes: �meet_out�, �meet_in�, �communication�, and �communication+buy_in�. People from those classes increase except the last one.

 

1.2 Overall business picture

 

Fig. 1-4: a line chart showing changes of four businesses over time

 

Numbers of calls, emails and purchases kept steady in the 2 and half years. Given more and more staff engaged in company business along time, their works were becoming less intensive. Only meeting records increased in time, which might illustrates the growing demands for meetings while the company enlarged its scale.

 

1.3 Communication & purchase habits

 

Fig. 1-5: The stacked bar charts above show personnel constructions for 4 different activities every month. Personnel are divided by the month they entered the company (i.e. time of the first record). The top bar of each month stands for newly enrolled people, while the bottom bar stands for the earliest employees.

 

Online communication records have low staff mobility, as most new employees engaged in business in the successive months. In contrast, meeting records show high mobility. Among newly enrolled people in a month, only a few would keep participating in meetings. To explain this, we infer that the company might extend business mainly by meetings.

 

2.     Combine the four data sources for group that the insider has identified as being suspicious and locate the group in the larger dataset. Determine if anyone else appears to be closely associated with this group. Highlight which employees are making suspicious purchases, according to the insider�s data.

Limit your responses to 8 images and 500 words.

 

To accomplish the goal of detailed investigation of small subgraph in a large dynamic network, we developed a system called Traceability Analysis System. It takes each employee as a node, and the records provided by the insider as instances of links between each nodes. With this system, we are able to explore the company with an initial point. So we started with the unique suspicious purchase record. Then we explore the whole suspicious group step by step. On one hand, when a new node is added to the system panel, the interactions between new node and exiting nodes would be shown. On the other hand, the system provides a configure panel that enable selections based on multiple rules, e.g. link count threshold, common neighbor, relation type, etc.. Therefore, explicit filter for nodes that are closely associated with the target node or groups could be done with a simple click on a button. As a result, we found some employees closely associated with the given suspicious group as the following table.

 

Note: The time shown in the system is eight hours ahead of the actual time.

 

As Fig. 2-1 suggests, we followed four steps.

    (1) Retrieve nodes (green) that link to the suspicious group for many times.

    (2) Select nodes (purple) that associate with more than one nodes in V.

    (3) Add both the potential employees above and the suspicious group into the timeline analysis panel, to check whether their connecting points are close.

    (4) Inspect the statistic information of each potential employees, to see the proportion of links between itself and the suspicious group.

 

Fig. 2-1: Analysis Steps

 

Then we obtained nodes that highly closed to the suspicious group.

 

Fig. 2-2: The group that associate with the suspicious group closely

 

ID Name Discription Closeness
786361
Sheilah Stachniw
connected with 8 targets, many records to V and mostly to V, and 1 suspicious purchase
****
981554
Sherrell Biebel
connected with 8 targets, many records to V and mostly to V
***
944354
Ferne Hards
connected with 5 targets, many records to V and mostly to V
***
2037156
Martha Harris
connected with 5 targets, many records to V and mostly to V
***
175354
Madeline Nindorf
connected with 4 targets, mostly link to V
***
1376868
Timothy Gibson
connected with 4 targets, and 6 suspicious purchases
**
713701
Jane Tyler
connected with 3 targets, and 2 suspicious purchases
**
713639
Juan Walsh
connected with 3 targets
*
713743
Sherilyn Coopwood
connected with 3 targets
*
713814
Jaunita Westen
connected with 3 targets
*
1981017
Terrilyn Overkamp
connected with 3 targets
*

Table 2-1: People that associate with the suspicious group closely.

 

To find out suspicious purchase records, we added relevant people into the timeline panel for further investigation. Combining their activities and statistic information, we�ve discerned the exceptional purchase records.

 

Fig. 2-3: All activities of the suspicious group and all purchases of the suspicious group.

 

Sheilah Stachniw (786361) only associated with people in the suspicious group and one other supplier. In a crucial point when suspicious group were communicating with each others, she made two purchases.

 

Fig. 2-4: Sheilah Stachniw (786361)

 

Timothy Gibson (1376868) holds most suspicious purchase records. First, he had a call at 0:21 a.m. on 20th June, 2015 with Richard Fox (857138), then he bought things from Gail Feindt six minutes later. Second, he sent an email to Meryl Pastuch (1690582) at 6:45 p.m. on 10th September, 2015. In half an hour after the email, he keep buying things from Gail Feindt twice. Moreover, he sent an email to Tobi Gatlin (969089) at 8:20 p.m. on 20th January, 2017 with a purchase record to Gail Feindt around half an hour ago. Last, he made a call with Lindsy Henion (1108217) at 6:22 a.m. 8th December, 2017 and bought things twice in the next two hours.

 

Fig. 2-5: Timothy Gibson (1376868)

 

Jane Tyler (713701) emailed to Richard Fox (857138) at 9:44 p.m. on 24th August, 2015 and bought things from Gail Feindt immediately. Similar behavior happened at 8:18 a.m. on 6th July, 2017, when he called to Tobi Gatlin (969089) and then bought things from Gail Feindt.

 

Fig. 2-6: Jane Tyler (713701)

 

Source Target Time Event
Sheilah Stachniw
Gail Feindt
2017-12-04 13:05:50
in the crucial period
Sheilah Stachniw
Gail Feindt
2017-12-06 03:53:23
in the crucial period
Timonthy Gibson
Gail Feindt
2015-06-20 00:26:52
immediately after a call
Timonthy Gibson
Gail Feindt
2015-09-10 18:53:11
after an email
Timonthy Gibson
Gail Feindt
2015-09-19 19:22:28
after an email
Timonthy Gibson
Gail Feindt
2017-01-20 20:04:58
half an hour before an email
Timonthy Gibson
Gail Feindt
2017-12-08 07:51:43
after a call
Timonthy Gibson
Gail Feindt
2017-12-08 08:29:50
after a call
Jane Tylor
Gail Feindt
2015-08-24 21:45:27
immediately after an email
Jane Tylor
Gail Feindt
2017-07-06 08:32:15
after a call

Table 2-2: Suspicious purchase records.

 

3.     Using the combined group of suspected bad actors you created in question 2, show the interactions within the group over time.

a. Characterize the group�s organizational structure and show a full picture of communications within the group.

b. Does the group composition change during the course of their activities?

c. How do the group�s interactions change over time?

Limit your responses to 10 images and 1000 words

 

a. Organizational structure

 

Fig. 3-1: The whole picture of all suspects and people who connect closely with them

 

In Fig. 3-1, we labeled them with their location inside or outside the group. Orange label stands for known suspect group. Pink stands for people who connect with more than one person in the suspect group. Green stands for people who connect with only one person in the group more than once. The only blue one is the biggest supply of goods.

 

We can generally find out the interaction pattern with the known group and other suspects we found:

    1. Several people (857138, 1690582, 1108217) have crowded edges connected with other suspects, and we call them organizers inside the group.

    2. Other people in the known group mainly communicate with several certain points outside the group (981554, 175354), and we call them organizers outside the group.

 

Then, we downsize the group to a core group, and generate a layout based on their status in the group (Fig. 3-2).

 

Fig. 3-2: Group status

 

Pink stands for outside organizers. Orange stands for inside organizers. Grey stands for people doing communications and purchases, which is the prevailing class in the dataset. Green stands for people doing communications and initiating meetings. Yellow stands for pure liaisers.

 

In this layout, we noticed that those organizers are in unique classes (e.g. 857138 is the only one in the dataset having communication, meet initiating, and purchase records). Their suspicious records confirm that they are key members of the whole suspect group. Specially, they have some strange meet records with other group members. For another example, 981554 is the closest person with the known group.

 

b. Composition Change

 

To illustrate the inner composition change in this section, we generate a concise layout including the original suspicious group and several important extensions.

Color annotation for part b&c:

    Orange: Original suspicious group

    Pink: Closely connected suspect with purchase records

    Blue: Closely connected suspect without purchase records

    Green: Closely connected suspect with far more total records than other people in the graph

 

Fig. 3-3: Phase 1: May - November 2015

 

In this phase, the suspicious group didn�t act much. In our concise version of expanded group shown in Fig. 3-3, only inside organizers (857138 and 1690582) has connections with Green nodes. We consider this as an incubation period before the group really established, so other subordinate personnel didn�t appear in this period.

 

Fig. 3-4: Phase 2: November 2015 - January 2016

 

In this phase, nearly all members in the original group participated in various activities. It is worth noting that the blue node is very closely related to the team members during this time. We can infer that these �outsiders� engaged in group activity at a very early time.

 

Fig. 3-5: Phase 3: February 2016 - June 2017

 

During this relatively long period of time, the interaction between the members of the extended team is relatively sparse, and the main interaction takes place in the main members of the group. However, this does not lead us to conclude that some members have withdrawn from the group. Basically, everyone is still in contact, but the frequency is reduced.

As this is a long time period, we extract three successive months to maintain uniform variables with other phases.

 

Fig. 3-6: Phase 4: July - December 2017

 

At the end of the stage, the core members of the group (the orange nodes near the center in the picture) were once again collectively dispatched for more contact. This phase is different from the first intensive contact phase for 2 reasons:

    1. Peripheral orange nodes did not participate extensively in interactions.

    2. Blue nodes seldom participate in interactions.

In conclusion, after the first activity peak (Phase 2), the composition of the group didn�t change a lot. Most of them would not disappear for a long time. They just engaged in activities in various frequencies.

 

c. Interaction Change

 

To observe the specific interaction details between the members, it is necessary to simplify the number of nodes in the graph as much as possible. For this reason, we divide the expanded group into three parts, shown in Fig. 3-7 :

    1. Outward: People who have much more records than the others, undertaking the task of communicating outward the group.

    2. Inward: People who don’t have many records. They mainly connect with other group members.

    3. Supplier: i.e. 2038003.

 

Fig. 3-7: Overview of interactions inside the expanded suspicious group over time

 

Then, we import records only within Inward group and also within the whole expanded group to compare, shown in Fig. 3-8.

 

    1. The beginning of the two concentrated events was a multi-person meeting (light green). This shows that the meeting is the beginning of a suspicious activity arranged by this group. However, people attending the two meetings was significantly different, perhaps representing the difference in the purpose of the two events.

 

Fig. 3-8 (a) Records with in Inward group (b) Records of Inward group in the whole expanded group

 

 

From the two timelines in Fig. 3-8, we can clearly see the peak of the two activities of the suspect group: the first one was from Nov. 2015 to Jan.2016, and the second one was in Sep. 2017. Looking at the peak of these two activities in a macro view, we can get the following behavior patterns:

 

Fig. 3-9: Overview of interactions for all 4 types

 

2. In the short time after the first meeting, the calls occupied the majority of the communication methods within the group, and after the first peak of events, the main communication method became email.

 

Fig. 3-10: Interactions of the second peak

 

3. Fig. 3-10 shows the interactions of the second peak in detail. The connection between Rosalia Larroque (1847246) and Kerstin Beveal (728286) is the main theme. It should be noted that Kerstin Beveal was very active in the later period. He also had five consecutive conversations with Sherrell Biebel (981554) in mid-May 2017 and participated in the second multi-person conference, which was a key object of doubt. Rosalia Larroque first made a call to Jenice Savaria (2038003), then made a purchase. In the successive month, Rosalia maintained close contact with Kerstin.

 

4. Compared with the first time, people from Inward group have many connections with outside the group abnormally. Furthermore, most of those connections are in email. Specially on December 4th, 2017, the team members interacted with the outside world on a large scale, and at the same time heralded the end of the group's activities.

 

 

4.     The insider has provided a list of purchases that might indicate illicit activity elsewhere in the company. Using the structure of the first group noted by the insider as a model can you find any other instances of suspicious activities in the company? Are there other groups that have structure and activity similar to this one? Who are they? Each of the suspicious purchases could be a starting point for your search. Provide examples of up to two other groups you find that appear suspicious and compare their structure with the structure of the first group. The structures should be presented as temporal not just structural (i.e., the sequence of events�A is followed by B one or two days later�will be important).

 

Limit your responses to 10 images and 1200 words

 

In summary, we get the following knowledges from the above questions and we utilize them to solve this question.

 

1. For organizational structure, there are core members in the suspiciou group, whose distances to group members ranged from 1 to 2, and mostly 1. Group members associate with each others for many times. Above all, group could be divided into three parts, i.e. bargain suppliers, the outwards and the inwards (see Figure 3-7).

 

a. Bargain suppliers (the blue node) is the destination of purchases.

 

b. Outward members (yellow and green nodes) have purchase records and many other records associating with people outside the group (the green nodes represent people with higher proportion of outside associations).

 

c. Inward members mostly communicate within the group and have no purchase records. Besides, they have less records than the outwards.

 

For temporal structure, there will be sudden large assosications between the outwards and the inwards. Especially when there’s a purchase, meetings and frequent communications would come together.

 

Therefore, our set up our strategy for finding similar groups U to the first group V.

 

1. Start from a small number of nodes S.

 

2. Select node N that link tightly to S with some criteria (weighted by orders) and add them into S iteratively.

a. large amount of repetitive edges;

 

b. multiple association targets in S;

 

c. records between N and S takes up more than half of records of N;

 

d. sudden association could be observed in the timeline;

 

e. preferably no purchase records.

 

The extra six purchase records involve four people in all. So we use these four people as our start points. Thus we find out four suspicious groups as Fig. 4-1 ~ Fig. 4-4 and Table.

 

ID name
320914
Cora Cross
1152569
Donnetta Lapoint
1981017
Terrilyn Overkamp
1141575
Lucy Herrera
1271503
Jerome Jordan
580766
Trevor Webb
1476791
Alesha Aschenbrenner
726693
Archie Griffies
2038138
Tyree Barreneche

ID name
2037766
Gregory Russell
437025
Beth Wilensky
1172172
Angelic Graetz
758683
Anjelica Hoger
160091
Cora Gonzalez
981745
Prudence Rosol
32081
Edgar McCormick
447025
Renae Hilbrand
468954
Abbey Rhead
1371959
Zachary Hampton
1590376
Olivia Brown
1710613
Amelia Colon
1771151
Karyl Snobeck
103229
Sherrl Brensnan
224204
Indira Fugua
369655
Layla Mostad
709913
Birdie Pioch
717869
Nikia Wilebski
723897
Cecilia Pichette
755431
Ollie Andrews
1422630
Amada Faul
1424784
Virgie Pratt
1446867
Fonda Bursch
1499804
Katharine Santos
1505718
Valentine Klette
1608171
Ilona Barros
1714881
Concha Goodall
1938283
Lupe Gullatt

ID name
2037860
Carlos Morris
695013
Laure Pelkley
312722
Courtney Wiedemann
1147062
Yong Wilbert
221502
Arthur Fox
1066593
Marjorie Halbach
1564399
Daria Housten
1880061
Renae Hilbrand
1880061
Merlene Tessier
9391
Roger Beck
170126
Virginia Buchanan
200423
Sharmaine Lofredo
206602
Rena Jerabek
1200868
Jayden Walters
1266039
Dorothea Kulback
1376314
Marc Bowen
1572181
Alan Sedotal
114786
Elva Ingram
250565
Jessica Pokoj
350756
Omar Tako
1229528
Regina Bordoy
1740389
Sherlyn Delcine
40236
Lulu Larson
115539
Dina Fairy
293560
Kathie Matheu
1706578
Sha Pardoe
1862079
Cathleen Kucinski

 

Fig. 4-1: Group 1, found from Trevor Webb(580766)

 

 

Fig. 4-2: Group 2, found from Beth Wilensky (437025)

 

 

Fig. 4-3: Group 3, found from Laure Pelkley (695013)

 

 

Fig. 4-2

 

For the suspicious activities, an abnormal phenomenon happened in 4th December, 2017, when a large number of association between the inward nodes of each groups and people outside its group took place.

 

Fig. 4-5: Suspicious events happened around 4th December, 2017.

 

In the overview of the four groups, they are similar at the record distribution over times and also accord with the suspicious group. First, their purchase records take place and the end of the timeline. And there seems to be an invisible line in the middle of the timeline, where record amount vary a lot before and after that.

 

However, compared with the suspicious group as in Figure 3-7, there exist many differences. Group 1 has a smaller size, let alone fewer records. Group 2 has remarkable outward nodes, and hence its timeline view is more complicated relatively. Group 3 behave rather stable. As for Group 4, it is highly similar to the suspicious group both in organization structure and temporal structures.

 

Fig. 4-6: Comparison of the four groups..