Entry Name:  "UBA - Chanta Miners - MC2"

VAST Challenge 2014
Mini-Challenge 2

 

 

Team Members:

 

Sergio Manuel Villordo, Universidad Nacional de Buenos Aires, sergiomanuel03@gmail.com   PRIMARY
Hee Joon Park, Universidad Nacional de Buenos Aires, hee@mac.com
Luciano Cabrera, Universidad Nacional de Buenos Aires, lucianocabrera@gmail.com
Juan M. Bodenheimer, Universidad Nacional de Buenos Aires, jbodenheimer@instare.com
Juan Pablo Ferrandez, Universidad Nacional de Buenos Aires, jpferrandez@gmail.com
Antonio Tralice, Universidad Nacional de Buenos Aires, atralice@gmail.com

Student Team:  YES

 

Analytic Tools Used:

 

Tableau (http://www.tableausoftware.com )

sqlLite3 (http://www.sqlite.org )

Microsoft Excel

Inkscape (http://www.inkscape.org/ )

R (http://www.r-project.org/)

Qgis (http://www.qgis.org/en/site/)

PostgresqlPostGis (http://www.postgresql.org/) (http://postgis.net )

SIMILE Widgets Timeline (http://simile-widgets.org/timeline/)

 

 

 

Approximately how many hours were spent working on this submission in total?

Provide an estimate of the total number of hours worked on this submission by your entire team.

 

180 h

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete? YES

 

 

Video:

 

http://youtu.be/xNbROh350Dw

 

Toma3

 

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1 – Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like?  Please limit your response to no more than five images and 300 words.

 

With the objective to understand a normal day of a GASTech employee we perform an data exploratory analysis of all company cars/trucks movements and purchases.

The following graphs try to summarise the regular day of employees taking into account the given information.

 

Description: https://lh5.googleusercontent.com/NTeVrhAjcBBR_3pvq9L4h6aAPo0gy4pwqdBo36dwvKiwfAxF8G6Rzdxv69TlTBoif8EINuvEBU_9WpRJUmXyozhfCnTidZPk6EU9lld9q9lfwqwTBgM_AAF77lvzvL7igQ

Figure1. Employee movements and purchases presented by employment type and day.



Description: ig2.png

 

Fig2. Violin plot of recorded GPS movements presented by day and hours for each employment type

 

Description: PS_freq3.png

 

Figure3. Matrix of bar chart of employee movements plotted by day for each employment type.

 

 

Description: oni.png

 

Figure 4. Matrix visualization of movements and purchases during the the day presente by employment type.


In a regular day, an employee start moving between 6/7am to 8/9 am. Around 11am/12pm to 2 pm) they go out for lunch. In general the movement starts again at about 5pm. This movement starts to fade until about 9pm. There is some movement after that, but you will see it in a minor degree. Some interesting differences can be found if you compare different employees types or different moments of the week, for example:

- Facilities could be understood as a bit different in this pattern: they show a more continuous activity during the day.

- At weekend you can see important changes. People do not move so much. Facilities only show minor movements. Engineering and security are the ones that show the major part of the GPS data.

 

Those graphs also are useful to show some special cases that will be discussed later in this report (some people are moving late at night/early in the morning).

Related to the expenses, when we try to understand a normal day of a GAS employee from the perspective of their expenses, we will see some similar things as the knowledge we get from the GPS data. Many employees start from the morning having breakfast, coffee or some other activity in the morning (7-9am).  Some engineering employees do not do that. Facilities’ employees show a more continuous activity, not as divided as people from other areas. At lunch the activity starts again: people go to lunch outside the office and we also see a similar kind of activity after work (Happy hour, go to dinner, etc.). The facilities employees are different in those patterns, showing always a continuous activity, and almost no activity, when others go for dinner/drinks/etc. You will see a peak of expenses of Information Technology people: that’s a $10.000 expenses outlier that will be treated later in the report.

 

 

MC2.2 – Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe

a.       What is the pattern or event you observe?

b.      Who is involved?

c.       What locations are involved?

d.      When does the pattern or event take place?

e.       Why is this pattern or event significant?

f.        What is your level of confidence about this pattern or event?  Why?

 

Please limit your answer to no more than twelve images and 1500 words.

 

 

1.      10.000  Purchase: as you can see in the graph, this expense seem to be an outlier. So we think that Lucas Alcazar (car ID:1) should be investigated. There is another important issue regarding this transaction. Alcazar GPS data doesn’t have a temporal match with the moment of purchase. From our analysis we can’t confirm a special reason for this strange behavior. A third reason for investigating this issue, there is a credit card transaction 10 minutes later in the place where he left his car before, that is not near/close to the $ 10.000.- purchase place. Id24 was in the place where this important purchase had been issued.  In that way we really think that the whole issue deserves to be studied. In addition, Id1 met several times to Id: 21 and 24 (These guys are security control associated to other suspicious behaviour, see bellow).

 

 

 

Description: pto2_1c.png

 

2.      ID 1 movements is not common. This person shows GPS movement in an hour that is not common for GAS Tech employees. We have seen in the first part of the report that the pattern of behavior of GAS Tech employees during the day is different. 

 

Description: pto2_6.png

 

3.      ID 28 strange GPS data: When you try to graph the GPS data of that worker, it shows a strange behavior: he/she seems to be wandering as also to be set off to a side. Some analysis were made and to explain that phenomenon as a fast car movement, or similar ideas, is not possible. When we mean that the data seems to be “displaced”, we mean that the GPS point show move spatial displacement from the different locations that this person visits (for ex. the office location and the GPS data). It also can’t be possible for ID28 to walk/drive  in the middle off  the lake ). The displacement reaches about 590 meters.  What happened here? Any problem with his GPS equipment? Or someone has manipulated that data?

 

 Description: pto2_3a.png

 

4.      The CEO President of GASTech: He has no GPS data over all the two weeks, but suddenly, three days before the hijacking he seems to start using the car (starting on the 17th), or his card starts to be used by someone. If you analyse the credit card or loyalty card information you won’t see any movements before the 17th. In addition, he pays a lot of money (600USD) at the Chostus hotel the 18th. Why those days? What had he done before? Maybe that’s not a strange movement, but as we do not have any movements from him before those days, we can’t establish which his normal movements are.

Description: pto2_4c.png

 

5.      Trucks get restless: The type of movement they show wednesday, thursday and friday previous to the last weekend change. If you start comparing the following image with the one before you will see major changes. Trucks didn’t move after 4pm. But on the 15th or 16th they continue. Why? What are they doing? Which are the reasons to continue with the activity when they didn’t do that the days before?

We can’t see such a change of movement in other areas, only trucks have such an important change of how much they move those days. When we analyze their movement we will see that they go to the airport. What for?

 

Description: pto2_5.png

 

6.      Supervising / Looking at what the C-Level does

 

As you will see that are some very strange things happening with the security guys and the C-Level executives from the company.

 

Description: ANALISIS21.png

From the images we share, it seems that the C-Level is being watched, or something else. Security people are near to them at night, and they have shift por the positions, changing places in the middle of the night. The CIO, COO, CFO and Ev. Safe. Act. get this “special attention” from security.  After the shift at (or by) the C-level executive home, each one goes back to their own house.

Each graph shows the GPS activity for each Id being the Y-Axis the time of the day. So it is very clear, combined with location map, that this special attention activity is being held.

 

7.       Employees going to the Kronos Capitol. In the following heatmap, we show the IDs of Gastech's employees who sistematically go to Kronos Capitol. One of them goes on Saturday 11th (IDs 25). The others go on Saturday 19th (the rest of them). We think that this in a suspicious behavior has to be investigated.

Description: pto2_9.png

 

8.      Meeting Friday 10 at night: In this heat map, we show the most active places in the city after 5 PM on Friday 10th. The graph plots the Lars Azada’ home area as one that had more activity. The other two sectors of the map where intense activity is shown are Gastech and the way to Azada's place.

 

Description: pto2_7c.png

 

9.      Guy’s Giros meeting: CEO goes to Guy Giros on sunday 19th at night. There were other employees at that place. You can find different credit cards movements at the same time there. In a 30 minutes range you will find 11 card movements in that place. That makes us wonder if that is a just a coincidence, or there was something coordinated about the 11 people involved in that situation. It is also very strange that for that hour is busy car (15 cars were identified in that area at that time).

Description: pto2_10.png

 

Description: pto2_8b.png

 

10. By all exposed,  we really think that further investigation involving the next persons is needed:

CEO/president(ID:31)

Some members of the Security staff (ID:15,16,21 y 24)

IT Helpdesk (ID:1)

Some Facilities guys (people that use the ID101 and ID106)

In addition a putative net of contacts involving these peoples were detect, but more evidence is necessary to confirm.

 

 

MC2.3 – Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data.  Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.  Please limit your response to no more than five images and 300 words.

 

 

 

1.      Loyalty and Credit Card Differences: There are some movements where you see a price difference between the loyalty and credit card. Analyzing the differences, it seems to be some kind of typing problem in one of them. (for ex. 11,51 and 51,51 or 27,84 and 7,84)

 

2.      ID 28 GPS data seems to be strange: maybe some changed it. as stated in the strange issues. We observed in this situation the general path that ID 28 took, despite this strange data behavior.

 

3.      Kronos Mart Credit Card purchases: When you compare the timestamp, it doesn’t match with the GPS or the loyalty data. It seems that the each credit card movement is 12 hours earlier that what you have in the data. The loyalty data is registered the day before to the purchase, and the moment when the person is in the Kronos Mart is 12 hours before.

Description: Kmart.png

 

4.      Jack’s Magical Beans Credit Card purchases: When you compare the timestamp, it doesn’t match with the GPS. It seems that the credit card purchases are all at 12PM. while the time when the people stay at Jack’s Magical Beans are earlier.

 

Description: Jack.png

 

 

5.      The data (date) was in different formats: Some for ex. dd/mm others were in mm/dd. We had to correct this differences to continue our analysis.

 

6.      GPS data problems:  There were more than one GPS point for the same person at the same time (second), this means that you can not know the exact location as the data implies they are in two different locations (really close to each other) at the same time. In spite of this, as we plotted the data, we could see each employee behavior.

 

7.      Missing Data: Trucks vs. Truck’s Drivers. The GPS data was associated with a driver ID, but not with the truck itself. In the car assignments table there wasn’t information about who was driving which truck.

 

8.      Administrative Positions: No cars. These positions didn’t have any car assigned. So we didn’t have any GPS information of them. We had to use their credit card information to know about their routines.

 

 

 

 

Web Accessibility