Entry Name: "UKON-Jentner-GC"

VAST Challenge 2019
Grand Challenge



Team Members:

Wolfgang Jentner, University of Konstanz, jentner@dbvis.inf.uni-konstanz.de PRIMARY
Juri Buchmueller, University of Konstanz, buchmueller@dbvis.inf.uni-konstanz.de
Hanna Schaefer, University of Konstanz, schaefer@dbvis.inf.uni-konstanz.de
Thilo Spinner, University of Konstanz, spinner@dbvis.inf.uni-konstanz.de
Rita Sevastjanova, University of Konstanz, sevastjanova@dbvis.inf.uni-konstanz.de
Fabian Sperrle, University of Konstanz, sperrle@dbvis.inf.uni-konstanz.de
Dirk Streeb, University of Konstanz, streeb@dbvis.inf.uni-konstanz.de
Udo Schlegel, University of Konstanz, schlegel@dbvis.inf.uni-konstanz.de

Student Team: NO


Tools Used:

N.E.A.T. - Novel Emergency Analysis Tool, developed by the University of Konstanz
Demo available to try out at:


Login Name: dbvis
Password: beschte

Code available at https://github.com/dbvis-ukon/neat


Approximately how many hours were spent working on this submission in total?



May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete? YES







1. Generate a master timeline of events and trends during the emergency response. Indicate where it is uncertain and which data underlies that uncertainty. Limit your response to 1000 words and 12 images.

Our general approach to tackle such a timeline consists of the application of streamgraphs and episodes in the NEAT tool described in task 4. We identified interesting points in time according to the volume of Rumble messages per category and district. Also, we compared the statistical deviation of the measured radiation from normal levels using the streamgraphs and applied the episode plots to determine peoples reaction to the events. Basically, episode plots represent a significantly dense occurring set of n-grams (messages from Y*INT) that exceed a given threshold, thus probably being important (e.g. by amount of retweets). However, besides the analysis with NEAT and the annotations created in situ, we also composed a timeline of events in a written format:
Time and Date Event
14:05 - 06.04. Power Outage or Server Failure of the Rumble App - can be observed in the rumble app data volumes (MC1) - contrary official canal in messages show power outage but at 13:39 (MC3) - high time uncertainty between MC data
14:33 - 06.04. Small earthquake hits the city - official message on message canal (MC3) - people talking about it in the messages, but it was only a small one without damage (MC3)
18:08 - 07.04. Preperation course for disasters in Neighborhood 1 (MC3)
08:36 - 08.04. Large Earthquake hits the city - large increase in rumble app traffic (MC1) - major sensor outages (MC2) - increase in messages (MC3)
09:11 - 08.04. Fire at the nuclear plant (MC3) also fence down and broken main building (MC3)
09:15 - 08.04. Bridges closed until inspection (MC3) - people spoke a few days before about bad inspectors for bridges - unsecure bridges
09:49 - 08.04. Fire Alert to evacuate buildings (MC3) also real fire in a lot of areas (MC3)
10:00 - 08.04. Brick buildings collapse (MC3) in old town - hit very hard by earthquake (MC1)
13:00 - 08.04. Water contamination and sewer damage (MC3) - repair emergency teams are sent out to help (MC3)
13:39 - 08.04. Power Outage (MC3) - not observable in the rumble app (MC1) or the sensors (MC2)
13:56 - 08.04. Heavy Hospital Damage (MC3) - only one completely functional hospital (MC3)
14:30 - 08.04. Famous Singer Missing (MC3) - after her apartment collapsed and could be under the damaged structures (MC3)
16:40 - 08.04. Fake? Contamination - High values but if compared to location these are located in the wilson forest - either there is some contamination there or it is a sensor failure (MC2) - this is bad as ther is a large shelter in the wilson forest (MC3)
07:00 - 09.04. Power Restored in some parts of the city (MC3)
09:01 - 09.04. Sewer System damaged heavier than expected and repair takes longer than expected (MC3)
09:30 - 09.04. Free Concert from some famous singers also missing singer included (MC3) - Infrastructure Damage
14:36 - 09.04. Another mediocre earthquake (MC3) - can also be seen in the rumble app data (MC1)
15:28 - 09.04. High school collapsed (MC3)
09:30 - 10.04. Schools closed in Old Town, Scenic Vista, Broadview, Chapparal, Easton, and Oak Willow (MC3)
11:59 - 10.04. Last shake kills messages (MC3) no further data and last time shake gets reported by rumble app (MC1)
Hospitals getting closed (MC3)
Fatalities rumors rise and rise to more than 500 but also go back to 50 (MC3)
Chemical conspiracy theories arise (MC3)
Rumble App gets used more often during earthquake events (MC1)
Contamination Sensors fail more and more during time (MC2)
People care for each ofter and give free food and space (MC3)
People start to feel more and more alone (MC3)
Libraries are new shelters (MC3)
People don't want to move to shelters because they already have a spot they like (MC3)
Dark jokes about having no tsunamis at least (MC3)
Some grocery stores only sell limited amount of supplies (MC3)
Mobs start rumbling around town (MC3)
Shelters get too crowded (MC3)

Fig 1. Major events in the city extracted via burst analysis of MC3 Message data
Fig 2. Major events corresponding with the timeline above in NEAT
Fig 3. overcrowded shelters in NEAT
Fig 4. heavy water problems in NEAT
Fig 5. helpful people in NEAT
Fig 5. contradicting amounts of Fatalities in NEAT
Fig 6. locked hospitals and shelter problems in NEAT
Fig 7. started building inspection in NEAT
Fig 8. limited resources only in stores to buy in NEAT
Annotations have been created in-situ in our tool and are shared between all registered users in real-time.

2. Identify and explain cases where data from multiple mini-challenges help to resolve uncertainty, and identify cases where data from multiple mini-challenges introduces more uncertainty. Present up to 10 examples. If you find more examples, prioritize those examples that you deem most relevant to emergency response. Limit your response to 1000 words and 12 images.

Uncertainty is a very broad concept and hides along the whole data analysis pipeline, beginning from quality and trustworthiness of data sources and ending at visualizations aggregating data points. For this year's VAST challenge, we identified two different main sources of uncertainty: First, positional uncertainty. Some districts feature higher numbers of inhabitants than others, which creates uncertainty for less populated areas. The other source is about data quality. Do we interpret peoples messages in the right way? Are there sensors with quality issues? For radiation measurements, we abstracted the cpm values to semantic categories from low to dangerous levels of radiation. Also, we introduced too high and too low value categories, catching much of the uncertainty here (e.g., some values are below 0 which makes no sense for counts per minute). Examples for uncertainties we found:

We see an almost identical increase in dangerous radiation measures and radiation measures from Wilson Forest. This might indicate some external causality, faulty data, or even the source of the radiation.

We see a high entropy (one of our representative uncertainty measures) for Old Town Rumble reports during the whole phase between the major earthquake peak and a second measurement peak shortly afterwards in Rumble. This might indicate, that there is some problem with Rumble reports during this phase, especially in Old Town.

We see that during the increase in reporting of dangerous radiation, there is no change in radiation reporting in Safe Town. This could indicate that the area is already evacuated.

We see almost no reports of damages during the first smaller earthquake, but we see an increase in Y*INT reports. Additionally, our uncertainty measure (entropy) indicates a peak for the same datapoint in the Rumble dataset. This could demonstrate a high difference between reports.

During the second earthquake, the errors is the other way around. While Rumble clearly reports high shaking intensities, the Y*INT messages only have a slight peak. Later on, when the full consequences of the earthquake unfold, the Y*INT also shows a high peak.

In Rumble we see a third earthquake shortly after the second one. However, this does not reflect in the Y*INT dataset. Additionally, the sudden stop of Rumble reports after the second earthquake, might indicate, that there was some system downtime and previous reports were coming in later.

During day 9 we see continuous reports in Y*INT, but only two small spikes in Rumble and a slow decrease in the radiation volume. This creates a lot of uncertainty about whether another incident has happened. Or which of the three is true.

3. Are there instances where a pattern emerges in one set of data before it presents itself in another? Could one data stream be used to predict events in the others? Provide examples you identify. Limit your response to 500 words and 8 images.

The strongest correlation of datasets is present between Rumble and Y*INT. We used the CrisisLex EMTerms for crisis tweets to categorize the Y*INT datastream. Some of these categories are in line with the damage reports offered by Rumble. For example, Rumble tracks damages to the "sewer and water" systems, while the EMTerms keep track of "Water, sanitation, and hygiene". When comparing the reports from both timelines, we see that the peak in reports for water sanitation shows a few hours later than the peak in reports for sewer and water damages.

However, the order of causality is not the same for other categories. One example of a reversed dependency is the EMTerms category "YInt Caution and advice". We can see that alters and warnings to be careful started in Y*INT before the first shake intensities were reported in Rumble.

Furthermore, these dependencies are not consistent. In case of Y*INT messages on injuries and medical reports in rumble, only one instance shows simultaneous peaks. Other peaks are completely independent of each other.

Finally, even the absence of data can be used as an indicator. For example, the Y*INT messages suddenly stop shortly before Rumble notes another peak in shake intensity reports.

Concerning the radiation measurements, it is more complex to find relations with other datasets. One example is the influence of the shake intensity measurements from the rumble app on the volume of radiation measurement in some districts, such as Scenic Vista. The volume of measures in this district regularly decreases around 12PM and then increases again. Once Rumble started reporting intense earthquakes, these low volume measurement phases were prolonged until the measurement volume became completely stable. However, this dependency does not work for all the locations of St. Himark.

Reasons for the strength in dependency between Rumble and the radiation measures of each location could be the type of infrastructure that this district depends on. One very strong example of a similar dependency is the district Terrapin Springs. Here a seemingly small peak in reports on roads and bridges seems to lead to a quick and almost complete loss in measurements for this district.

Finally, we only found one scenario that shows an event in Rumble reacting to the radiation measures. Shortly before the first reports on a larger earthquake were sent, the radiation sensors already showed an increase in measurement errors, such as negative numbers.

For the relation between Y*INT messages and radiation measures, we could always use the relation of Rumble with both. One scenario, that was not as clearly visible in Rumble, but relates both Y*INT and the radiation is the beginning of dangerously high radiation. This event coincides with the end of Y*INT reports on safety and security. When looking more closely, the episode of reports that stopped in this instance is the closed Magritt Bridge.

4. The data for the individual mini-challenges can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency.. Were you able to bring analysis on multiple data streams together for the grand challenge using the same analytic environment? Describe how having the data together in one environment, or not, affected your analysis of the grand challenge. Limit your response to 500 words and 10 images.

NEAT is able to load and display all available data of the vast challenge and allows users in a collaborative environment to analyze the data streams simultaneously, sharing the insights even across remote locations. NEAT organizes users in different groups and provides two analysis environments plus an annotation summarization environment that displays the results. The data is synced in real time between the users.
Neat provides three main views including a dashboard view, a master timeline and an annotation view (Fig. 1).

For each view, we transformed the data from MC1-3 in two types: categorical and positional (e.g. damage categories and damage locations). For each type, we calculated temporal development features and uncertainty factors such as the standard deviation. For Y*INT messages, we computed 17 message categories relevant for disaster management.(Fig. 2)

The master-timeline view(Fig. 3) allows the user to load numerous time lines either displaying data in the form of a stream graph (MC1/2/3) or as episode plots (MC3 only). The stream graphs can represent various types of data such as volume, median, standard deviation and entropy. The user can filter the data for each time line and reorder and group the time lines as well as place annotations in each chart. A visual cue in the form of a vertical line is displayed upon hovering on the time line and it is synced across all time lines as a reference. The user can add an annotation to any time line with a double click and provide it with a title and additional details.

The annotations(Fig. 4) are synced and a summary is displayed in the annotation environment. For each annotation the respective selected time, the user, the chart, and the manually entered details are displayed.

The third layout, the dashboard view(Fig. 5), supports detailed investigation of an event. On the left, two maps for the rumble data and the radiation data are displayed.

The multidimensional rumble data is visualized in the form of glyphs(Fig. 6). Per district, one glyph summarizes for each category the reported damages, while color codes the development since the last 30 minutes, where red is increasing and blue is decreasing. A horizon graph at the bottom shows the message volume distribution over time. The radiation data is plotted by its provided geo locations. The static sensors are represented as slightly larger circles. The value is mapped onto the color.

On top of this dashboard, a timeline as they occur in the multi-timeline view is displayed(Fig. 7). The user can select what data shall be represented in this timeline. Below the timeline multiple episode plots are shown vertically for various categories. The timeline can be brushed updating the data in the other views. The maps display the data that occurs within the brush. The episodes always display all available episodes but grey out episodes outside of the brush. The episodes can be expanded by clicking on the title. This displays the episode text next the episodes. Our prototype can be accessed for demo at vcgc19.dbvis.de (user: dbvis, pass: beschte). Our code is available at https://github.com/dbvis-ukon/neat