Michael Steptoe, VADER Lab, Arizona State University, firstname.lastname@example.org PRIMARY
Robert Krueger, VIS, University of Stuttgart, email@example.com
Yifan Zhang, VADER Lab, Arizona firstname.lastname@example.org
Xing Liang, VADER Lab, Arizona State University, email@example.com
Rolando Garcia, VADER Lab, Arizona State University,
Sagarika Kadambi, VADER Lab, Arizona State University, firstname.lastname@example.org
Wei Luo, VADER Lab, Arizona State Univeristy, email@example.com
Thomas Ertl, VIS, University of Stuttgart, Thomas.firstname.lastname@example.org
Ross Maciejewski, VADER Lab, Arizona State University, email@example.com
Student Team: YES
Approximately how many hours were spent working on this submission in total?
~500 hours between all participants
May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? Yes
For each of the following questions, consider both the movement and communications data.
GC.1 – Scott is not a paying customer and does not have an ID. Describe Scott Jones’ activities in the park during the three-day weekend. Who does he spend most of his time with? When does he arrive? When does he leave? What route does he follow?
Limit your response to no more than 10 images and 1000 words.
We have developed a visual analytics interface for exploring the spatiotemporal communications data in Dinofun World during the weekend of Scott Jones’ visit. Our system consists of three views: the exploratory view; the report view, and; the communications view. The Exploratory View (Figure 1) has four primary components:
1. Analytics Interface:
a. ID selection: A user can input a list of visitor IDs and view their trajectories on the map (2). By pressing play, the trajectories are animate and the communications data is also visualized as points on the map that appear at the time of the call and then fade.
b. Visual query: A user can select time and location intervals on the calendar view (2) and create a visual query with logic operators (AND/OR/NOT). This query will return, for example, the IDs of all patron that were at attraction 38 at 4PM and at attraction 45 at 9PM on Friday. This is our primary feature for finding users that were at locations of interest at particular times. IDs and trajectories returned from the query are plotted in the trajectory view.
c. Cluster: All visitor trajectories can be clustered using a Levenshtein distance function and hierarchical clustering. If a tolerance of 0 is selected, the resultant clusters consist of the IDs with identical trajectories (in terms of locations visited at the same time). Reducing the tolerance provides fuzzier clusters (i.e., they have visited ‘mostly’ the same locations at the same time during their stay). Groups found are plotted in the trajectory view where the trajectory is shown to be the most representative trajectory of the group.
d. Outliers: The larger the smallest Levenshtein distance is, the more unique a trajectory is. This slider returns the top n-IDs with the largest distance. The IDs and trajectories are plotted in the trajectory view.
e. Calendar aggregation: This controls how the rows in the calendar view are sorted (by region, attraction or ride type) as well as the data plotted in the cells of the calendar view (data can be the number of visitors at a ride, the number of sent/received/external/unique calls sent from a ride at time t).
2. Map View: This view shows the trajectories of selected IDs and also animates their movements over time showing communications during animation. This view is also linked to the trajectory view, thus when brushing over a section of the trajectory, that movement segment is plotted on the map. A heat map view coloring each pixel by the number of times a visitor stepped there can also be displayed.
3. Calendar View: Each row represents an attraction in the park and each cell is colored based on an aggregation chosen from control 1e. Data can be viewed for each day, or all three days are aggregated in the ‘any day’ view. The ‘every day’ view shows the counts of IDs that were at the same place at the same time every day. Each cell represents 30 minutes of time.
4. Trajectory View: This is a pixel based representation. Each cell is a 5 minute time interval that is colored based on the location a user is at in the park.
5. Distribution View: This provides a histogram view of the number of sent/received/external/unique calls made during a time period. The y-axis is the number of IDs and the x-axis is the number of communications made (a histogram of call distribution by ID). Users can click a bin to see all the IDs in a bin, in this way we can find those IDs with unusually large amounts of communications.
Figure 1: Exploratory View
Once a user identifies IDs of interest, they can use the Report View to create side-by-side comparisons of ID trajectories, communication networks, or explore clusters of IDs (created by Figure 1 – 1b). Feature vectors (such as number of thrill rides visited), communication metrics (centrality) and trajectories are all shown. Figure 2 shows several of the possible views that can be explored. By double clicking an ID in an image, the user can retrieve all other IDs that sent or received communications from that ID as well to allow for quick exploration of the communication network.
Figure 2: Report View
Our first goal is to determine a time and location that Scott is known to be at. We were told that there are two shows a day in the park, and the calendar view reveals these to be occurring at the Grinosaurus Stage. The first show starts around 9:30AM and finishes around 10AM Figure 3-1. The second show, starts around 2:30 PM and ends around 3PM. Figure 3-2 We quickly see that on the last day the second show does not take place. We hypothesize that the vandalism must have been discovered after the first show on Sunday, resulting in the cancellation of the second show.
Figure 3: Finding Scott's shows
Since we know when and where Scott was at certain times on Friday-Sunday, we can create a visual query that requests all IDs that were at Grinosaurus Stage on Friday, Saturday and Sunday showtimes. This query (Figure 4 1-2) reveals a set of 8 IDs. These IDs follow identical paths through the park (Figure 4 – 3,4) from the hotel to the stage and back. We hypothesize that the soccer star spends his time with this staff that accompany him to the stage and back a few minutes before the shows start.
Figure 4: Scott's entourage only goes to the hotel and stage.
We also explore the communication data for these 8 IDs and find that none of these IDs sent any communications during the weekend. This is strange as we would expect his handlers to be informed of the vandalism; however, it also seems to indicate that Scott does not arrange to meet any friends in the park either.
Figure 5: Scott's entourage doesn't talk to anyone
GC.2 – Identify up to 8 issues with park operations during the three-day weekend. Provide a rationale for your answers.
Limit your response to no more than 8 images and 800 words.
1. From the calendar view, we can quickly see that a variety of rides are closed at various times during the weekend. For example, Galactosaurus Rage is closed Friday from 19:30-20:00, Stone Cups is closed Friday from 20:00-20:30, and the Flying TyrAndrienkos also closes. There are several other issues such as these, but are likely normal issues and seem to get resolved rather quickly.
Figure 6: Rides are broken, kiddie land is sad.
2. On Friday between 20:00 and 23:30, there is a huge growth in the amount of sent messages (684) compared to the previous 30 minutes (59). We hypothesize that something may have occurred at the ride.
Figure 7: What's enchanting in kiddie land?
3. On Friday between 1:30 pm and 2:30 pm many people go to the Ligament Fix-Me-Up stand. Maybe something happened, and people got injured, we see individuals coming from attractions 1, 3, 7 and 8. However these are large groups that travel together so it is unlikely that all of the individuals are hurt, perhaps just one or two, but it should be investigated.
Figure 8: Injuries or just overly concerned families?
4. Some users have missing recordings. For example sometimes there is no check-in information, even if a visitor goes to a ride for hours. Sometimes movement information is missing for a while and then only the last couple movements are recorded before a visitor leaves the park. We hypothesize that the app is unreliable sometimes. The image below shows such a case. The visitor enters the Tyrannosaurus Rest bathroom and no other movement data or check-ins are recorded until 8:40PM.
Figure 9: You were in there for how long?
5. In our exploratory view, the trajectory view can be replaced with a probability view. For each ride, we can calculate the probability of each ride they may go to next. The arc diagram shows the most likely place to go next. Arcs on the top read from left to right, on the bottom from left to right. Here we can see that no one is most likely to go to Whitley’s Plushadactyl stand (attraction 43). We also see that people who visit souvenir shops (attractions 40, 41, 44-48) are most likely to visit a thrill ride next. These rides should offer storage to encourage shopping and riding. However, what this really shows is that people are not using the app the check-in to restaurants or stores. It is currently not possible to determine which groups of people are making purchases in the park without inferring check-ins from movements. Thus the park is not good at determining turnover rates for the stores, or how much time (on average) a paying visitor spends at a store vs. a non-paying visitor. The theme park has a financial interest in identifying its highest paying customers. We are not sure if no one goes to the Plushadactyl stand, or if sales there are quick enough to result in waits of less than 5 minutes (our inferred check-in threshold).
Figure 10: Buy then ride? Probability plots showing where you're likely to go next.
Figure 11: No one wants a plushadactyl
6. If the venue's proximity to an attraction is what determines visitor count (as opposed to product being sold), then we would expect more visitors at Paleo Shreckwiches (36), because Paleo Shreckwiches (36) is nearest to the most popular attractions (Thrill rides 1,2,8); however, people seem to prefer going from Smoky Wood BBQ (53) to thrill rides. Venue 36 is closer to thrill rides, so it will cater to more customers. However from the arc view we know that people who like BBQ also like thrill rides, whereas people who like sandwiches tend to go to the beer garden (34) and Rides for everyone (30). So Selling BBQ at venue 36 could increase visitor count.
Figure 12: Sandwich and beer, bbq and rollercoaster?
GC.3 – For the crime, describe the following, and provide your rationale:
a. When did the crime occur?
b. Where did the crime take place?
c. Who are the most likely suspects in the crime?
Limit your response to no more than 5 images and 500 words.
We hypothesize that the crime occurred between 9:45AM and 11:30AM on Sunday at the Creighton Pavilion based on visitor patterns from previous days.
Figure 13: Using the calendar view to narrow in on the crime time.
To identify suspects, we create a visual query that returns all IDs that were in the pavilion during 9:45-11:30AM for more than 5 minutes (inferred check-ins are 5 minutes). ID:1502920 has also has a hard check-in (recorded by park) at 9:30. Figure 14 shows the calendar view for hard check-ins, and IDs with both soft and hard check-ins during this time.
Figure 14: Suspect trajectories after the visual query.
Exploring ID: 1502920, we look at the communication network, Figure 15, and discover that this ID communicates with 6 other visitors. We visualize their movement sequences and discover that ID:461004 and ID:416790 have the exact same sequence as the initial visitor but do not have hard check-ins to the pavilion even though their movements put them there at 9:30.
Figure 15: Communication networks identify more interesting suspects.
We use the Communications View (Figure 16). Input is a list of IDs. These IDs are represented as circles, the size of which represents the number of IDs that are at the same location. The x-axis is time and the y-axis corresponds to attractions. If a group of IDs move to a different attraction, a green line is drawn. IDs may leave/join groups, forming new groups. Communications for these IDs are: yellow lines showing communications that take place between two groups; red hashes showing external communications, and; blue hashes showing within group communication. The slope represents the number of communications. Figure 16 explores the 7 IDs. We see the seven visitors enter the park together. After entering the park they split into three groups: G1 (1502920, 461004, 416790); G2 (1123214, 1350546), and; G3 (1000279, 1187909). We see G3 waiting at attraction 8 for Scott to pass. Then G3 goes to attraction 5 and waits until Scott enters the stage (~9:30). During this time, G2 leaves the pavilion and waits at attraction 7. G3 and G1 communicate with each other around 10AM, and then G3 joins G2 at attraction 7 around 10:45. G1 communicates to the merged G2 and G3 several times from 10:55 to 11:10. We hypothesize that the crime takes place during this time period (10AM to 10:55AM). G1 and G2 then meet at attraction 6 after the pavilion has been re-opened to the public at 11:30AM.
Figure 16: We track how the suspects who are communicating travel together.
The above story is plausible as we know Scott has three local friends and three suspects travel together. Another suspect is ID:1983765. On the day of the crime, 1983765 visits the pavilion and leaves during normal hours of operation. This person goes to the Scholtz Express (train circling the park). This person rides the train for two hours, all other IDs that enter the train at this time of day ride for 20 minutes. It is possible that this ID put their tracking device on the train, went to the pavilion, stole Scott’s medals and then retrieved their device and left. This person has no communication data even after spending three days at the park.
Figure 17: Train-man