Entry Name:  "PKU-Chen-MC3"

VAST Challenge 2019
Mini-Challenge 3

 

 

Team Members:

Shuai Chen, Peking University, shuai.chen@pku.edu.cn PRIMARY

Sihang Li, Peking University, lisihang@pku.edu.cn

Liwenhan Xie, Peking University, xieliwenhan@pku.edu.cn

Yi Zhong, Peking University, 1600017715@pku.edu.cn

Yun Han, Peking University, yunhan@pku.edu.cn
Xiaoru Yuan, Peking University,
xiaoru.yuan@pku.edu.cn ADVISOR

Student Team: YES

 

Tools Used:

D3

Python

EarthquakeAware, developed by PKUVis Lab.

 

Approximately how many hours were spent working on this submission in total?

100 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2019 is complete? YES

 

Video

https://www.youtube.com/watch?v=hxzyhGwyKD8&feature=youtu.be

 

 

 

Questions

The City has been using Y*INT to communicate with its citizens, even post-earthquake. However, City officials needs additional information to determine the best way to allocate emergency resources across all neighborhoods of St. Himark. Your task, using your visual analytics on the community Y*INT data, is to determine the types of problems that are occurring across the St. Himark. Then, advise the City on how to prioritize the distribution of resources.  Keep in mind that not all sources on Y*INT are reliable, and that priorities may change over time as the state of neighborhoods also changes.

Data Processing

When looking into the exact messages manually, we found that apart from identifiable misspelled words, there are also many meaningless posts with grammar errors, which seem not from a normal human. Typically, they are using more than one subjects in a sentence, like "At the sadnlon my husbadnnd adnnd she something adnnd somethings adnnyone" (Branch16, Palace Hills, 8:18 April 6). As a result, we filter out a total of 4538 messages containing at least two neighboring subjects. Further, we discover that in most cases, when people repost a message, the Y*INT app will automatically add a prefix string of "re:" and everything remains the same. In this way, we match the original message with its reposts and calculate the repost number. As a repost can happen everywhere, showing sympathy or concern, they do not necessarily indicate what happen in the post location. And hence in the following analysis, we often filter out these reposts and focus on original messages, with a total of 14326. Further, we use a pre-trained emotion classifier to categorize messages into four classes, i.e. positive, negative, compound, neutral, which correspond to green, red, orange, and blue in the scatter plot view in our system.

1 Using visual analytics, characterize conditions across the city and recommend how resources should be allocated at 5 hours and 30 hours after the earthquake.  Include evidence from the data to support these recommendations.  Consider how to allocate resources such as road crews, sewer repair crews, power, and rescue teams. Limit your response to 1000 words and 12 images.

First of all, we search for keyword "earthquake" to see when the event happened. Filtering out reposts and the noise, three peaks could be observed in the histogram (showing total message count every 1 hour). Going through original messages around the peak, we discover two potentially confidential reports. One is from @Earthquake Prediction Center at 14:33, April 6, saying that "We jsutust recorded a Northwest.Old Town mild earthquake jsutust northeast of St.Himark town. Did you feel it? Probably no damage #HateDidWonder." The other is from is from @EarthQuakeSeersMoreover at 08:36, April 8, that "ALERT: A 6.7 earthquake just occurred off the NE shore of the town of St. Himark. This could be severe. Expect heavy damage". However, this could also be deduced through the sudden abrupt decrease of the total message (see Figure 1-2).
Figure 1-1 Messages with the keyword "earthquake".
Therefore, we regard the major earthquakes happened twice on April 6 14:33 and April 8 08:36 separately.
Figure 1-2 Spatial-temporal heatmap of the total message sum every 15 minutes. A clear breakpoint between 08:45 and 10:15 indicates the occurrence of important events.
The first earthquake(04-06 14:33:00)
From the histogram (see Figure 3-1) and comparison of word clouds (see Figure 1-3) changing over time, we can see that after the earthquake, there are no obvious increment of messages, nor distinguished increment of new topics. Additionally, the intensity should be rather low, for most people are using an interrogative tune, asking whether there is an earthquake (see Figure 1-1, left bottom). During the 5 hours, the high frequency words are ordinary words, revealing little information. However, when we limit the corpus under the keyword "earthquake", more highly relevant words appeal, including "building", "reinforce", "brick", etc.. On examining the original messages, we find that people are casting concerns about the stability of the house in the city. @Emergency Manager announced that a new funding has arrived in studying mason buildings against earthquakes, and @FearsChristine_FleisherBird suggested that the funding should bring the existent weak buildings up to code. This post has been successfully led to further discussions with repost number of 7. Although this problem is not directly caused by the earthquake, this message reflects the urgence for the government to put more effort into reinforcing dangerous old buildings.
Figure 1-3 An overview of the first five hours of the earthquake. (a) High frequency words during the period. (b) High frequency words under keyword "earthquake" during the period. Potential clues are highlighted in red. (c) Details of the highlighted words.

Similarly, no abnormalities other than daily topics are observed in the general corpus during the 30-hour period. And under the keyword "earthquake", the hot words are still about reinforcing the dangerous building.
Figure 1-4 Word stream around 04-07 20:30.
The second earthquake(04-08 08:36:00)
Unlike the first earthquake, the second earthquake seemed to be more severe. In the word stream, we can see that after the earthquake, some words related to the earthquake, like “help”, “bridge” started to appear(Figure 1-5).
After 5 hours at around 04-08 13:30, the count of messages began to burst. In the word stream, many words began to appear in large numbers, some of which were closely related to the resources like “water”, “power”, and “hospital”(Figure 1-5).
Figure 1-5 Word stream after the second earthquake.
We can see in the word stream that the most frequently used word is “water”(Figure 1-5). Also, in the timeline, we can see that the word burst around this period(Figure 1-6(a)). By checking the messages, we find that the account named BusyHCouch2001 posted a message at 04-08 13:00, saying that the water and sewer pipes in Old Town, Safe Town, Scenic Vista, Broadview, Chapparal and Easton were broken, and the water might be contaminated(Figure 1-6(b)). We also find that ObnoxiousOHouse posted a message at 04-08 11:46:00 to encourage people in Palace Hills to help the people in Broadview(Figure 1-6(c)), which is consistent with the situation in Broadview. Based on the observations, we can judge that these regions, especially Broadview, needed sewer repairment and water supply.
Figure 1-6 (a)Timeline of the word “water” and some relative messages.
The same as “water”, word “power” was also posted a lot(Figure 1-5). The messages reported some problems with the power supply. For example, Betty2004 saw the power line tilted over in Broadview at 04-08 12:07:00(Figure 1-7(d)). The messages could be seen all over the island(Figure 1-7(b)(c)), which means that the whole town was suffering a shortage of power. The official account of the nuclear plant AlwaysSafePowerCompany promised to send volunteers at 04-08 13:39:00(Figure 1-7(e)), but other people claimed that they did not see any volunteers, such as ChloeJohnson in Scenic Vista(Figure 1-7(f)). So the whole town needed the help of power.
Figure 1-7 Distributions of word “power” and some relative messages.
Another kind of resources is medical resources. We use the word “hospital” to check this kind of resources. The word mainly appears in the neighborhood of Downtown, Old Town, and also in some southeast regions. In Old Town, SunnyDay said at 04-08 13:52:00 that the hospital is closed, while RunsJohn_GibbyBear also posted a similar message at 04-08 14:04:00. Some messages from southeast regions like Scenic Vista also showed that the hospitals there were closed. This situation may because of the damage of the hospitals, as LazyIcecreamHunter said at 04-08 13:19:00 that there's rubble everywhere in the hospital. In Downtown, HealthDept said at 04-08 14:25:00 that suggested the minor injuries not going to the hospital. This situation shows that the shortage of medical resources had appeared all over the island, and it was very urgent to allocate medical resources.
Figure 1-8 Distributions of word “hospital” and some relative messages.
Bridge is an important type of public resources, and the word can also be seen in the word stream. We find that Downtown had most original messages(Figure 1-9(a)), which was kind of weird, as there are not any bridges in this region. After checking the messages, we find that DOT-StHimark, seemed to be an official account, posted a series of messages informing that all bridges were closed at 04-08 13:20:00(Figure 1-9(c)). RayDog1978, whose location was not clear, also suggested people not going to the bridges(Figure 1-9(b)). By checking previous messages, we find that four of all five bridges had already been closed at 04-08 09:05:00(Figure 1-9(d)). The Tranky Doo Bridge had been closed at 04-08 10:46:00(Figure 1-9(e)), but this bridge is not shown in the map. We can then deduce that the situation of the bridges was not good, and they needed road crews to repair them in time.
Figure 1-9 Map that counts all original messages in different regions and some relative messages.
Another two words related to people’s livings can also be seen in the word stream. One word is “shelter”, which locates mainly in Southwest, Downtown, Weston and Scenic Vista(Figure 1-10(a)). People in these areas showed their need for more shelters(Figure 1-10(b)). Shelters should be allocated in these areas. Another word is “buildings”, locating mainly in Old Town, Easton and Broadview(Figure 1-10(c)). The buildings in these areas were damaged badly, and needed to be repaired(Figure 1-10(d)). We also find that Oconnor83 said that the fence of the nuclear plant was down and one their buildings had partially collapsed at 04-08 11:30:00(Figure 1-10(e)), which also be well considered.
Figure 1-10 Distributions of word “shelter” and “buildings” with some relative messages. (a)(b)Shelter. (c)(d)(e)Buildings.
After 30 hours at around 04-09 14:30, some words related to the resources, like “water”, “hospital” decreased. However, there came some other problems.
The word “gas” was mentioned in the messages a lot. In the messages, people were mostly talking about the leak of gas(Figure 1-11(b)) mainly in Weston, Northwest and Southton(Figure 1-11(a)). After checking previous messages, we find that at 04-09 09:24:00, ThesaurusGasCompany said that there might be the leak of gas(Figure 1-11(c)). Also in earlier messages, Mrs.Tillenbottom smelled the gas at 04-08 08:47:00(Figure 1-11(d)). The observations show that these areas needed the repair crews.
Figure 1-11 Map that counts all original messages in different regions and some relative messages.
In the messages that contain the word “people”, we find that CleverCCouch posted a message at 04-09 14:25:00 in Southton, saying “Taxis are all backed up. Lot of people trying to call them since the buses are taking so long”. This might be caused by the lack of public transport resources, so in this area, more taxis and buses were needed.

2 – Identify at least 3 times when conditions change in a way that warrants a re-allocation of city resources.  What were the conditions before and after the inflection point?  What locations were affected?  Which resources are involved? Limit your response to 1000 words and 10 images.

Figure 2-1. "Shelter" is a high frequency word in the wordstream.
After 04-08, ‘shelter’ appeared with high frequency from the wordstream view.
Figure 2-2. Situations comparison in Terrapin Springs before and after the high school opened as a shelter.
Between 04-08 12:00 to 18:00, it appeared 176 times in the pots from Y*INT. Between 04-08 18:00 to 24:00, it appeared 207 times. To investigate the real need of people in different places, we filtered out messages which only retweeted others without extra comments added. These retweets did not show situations where the users were. By checking the original message containing ‘shelter’, we found that TerrapinSpringsSchools announced that the TerrapinSprings High School opened a shelter (colored message in Figure 2-2e). Before this inflection point, people in Down-town, Terrapin Springs, Scenic Vista posted more messages about ‘shelter’ (Figure 2-2b). We checked the messages in Terrapin Springs and found that people felt difficult to find a shelter. In the next 6 hours, people still posted messages about ‘shelter’ (Figure 2-2c). Although the high school opened as a shelter, some people could not find the shelter and discussed Terrapin Springs abandoning animals the shelter. At 04-08 23:42, the high school posted information about the shelter. We could infer from this message that some people without shelter had arrived at the school.
Figure 2-3. Conditions change as a new flood happens
The infrastructure in the shelter is another important condition we need to distribute or keep in good condition, since there are a lot of messages complaining about the power and heat and so on hoping to be fixed. Like at about 4-10 a lot of people say that the shelter is too hot to live which means the shelter should improve the living condition or the location of the shelter should be changed. Animal plays another important role in the city, so the shelter for them is needed as well. From the message we see the citizens ask for shelter for the animals several times. First, show up at 4-6 19 pm, and after the earthquake happened, at 4-8 14 pm people all care about the animal about their living condition and call for the shelter for them. After 6 hours later, in the Old Town, the first shelter for the animals show up. But the living condition of the shelter isn’t good. Because just after 1 hour, the shelter loses power. And this leads to the reallocation of the shelter and our base resource.
Figure 2-4. Messages about birdge in 04-08 and 04-09. Most of them from Down-town by DOT-StHimark
On 04-08, DOT-StHimark posted many messages about the closeness of bridges. He repeated posting these messages on Y*INT. In addition to the formal information, there also exists a lot of messages sent by the citizens show the condition change in the road conditions. Like people in Scenic Vista say that the bridge shake.
Figure 2-5. (b) the media message about the complain about the bridge
And at 4-8 14pm people in so many district report the sewer break down and need repair. "reporting sewer breakout in neighborhoods Old Town, Safe Town, Scenic Vista, Broadview. Take cover! (TVHostBrad, Downtown)". The breakout the sewer not only means a re-allocation of the sewer resource, but also suggest a road clean procedure to make sure the sewer breakout not influence the road condition. What’s more, people in Scenic Vista complain about the poor road condition: "Lot of schisms and holes in the road! Watch your step! (FastThiboutotIcecream, Scenic Vista)". This suggests a re-allocation of road maintainers. For an overview, on searching keywords of “bridge”, “road”, “highway”, and "street" in all the neighborhood, an outburst of messages could be observed at around 19:00 April 8, where Downtown, Scenic Vista, Palace Hill, Terrapin Spring see a major impact.
Figure 2-6. The overview of the message about road in different district
Figure 2-7. (a) On 04-08, DOT-StHimark reported about the closeness of different bridges. (b) On 04-09, some bridges were opened.
On 04-08, the messages from Down-town by DOT-StHimark were all about the closness of bridges (Figure 2-7a). On 04-09, some bridges wer re-opened. It indicates that resources about bridge crews were allocated to these re-opened bridges (Figure 2-7b).
Figure 2-8. Messages about fire from 04-08 08: 00 to 04-08 16:00 (a) and from 04-08 16:00 to 04-04 24:00 (b)
Figure 2-9. (a) Messages about fire from 04-08 08: 00 to 04-08 16:00 and (b) from 04-08 16:00 to 04-04 24:00
After the earthquake, many places were on fire (Figure 2-8). Some people reported fire events inn different places (Figure 2-9a). Some people reported the fire fighting situations (Figure 2-9b). The next day, people reported more about the events around the fire station (Figure 2-9c). The fire department was in short of firemen to deal with so many fire places after the earthquake. Firemen were allocated to different regions. The north-west and south-east regions of St. Himark were affected.

3 Take the pulse of the community. How has the earthquake affected life in St. Himark? What is the community experiencing outside the realm of the first two questions? Show decision makers summary information and relevant/characteristic examples. Limit your response to 800 words and 8 images.

The earthquake has disrupted the pace of life in the community. From Figure 3-1, the hourly amount of messages sees a regular pattern. It can be inferred that most people wake up or start working at 9 am, when the message first begins to flush. 2 pm is the peak message flushing time of the day. And the number of messages begins to drop at about 11 pm, indicating that citizens are going to sleep. Therefore, we can see that the people living in this city all enjoy a late and long nightlife. However, the earthquake has clearly affected the timetable of people in each neighborhood, for the total number of messages from 0 am to 6 am has gradually increased as shown in Figure 3-2.
Figure 3-1 An overview of the original posts. The upper half shows the overall distribution over time, where each dot represents a message and the y-axis indicates the repost number of this message, and the bottom view shows the distribution of each location.
Seeing from Figure 3-2, people are experiencing uneasy feelings with the day passes on. Especially on April 9, most things that people talk about are relevant to the quake, such as nuclear, power, collapse, rescue, etc..

Figure 3-2 High-frequency words from April 6 to April 10, ranging from 0:00 to 6:00.
Before the earthquake, people tend to share the ease of life on Y*INT, yet the topic has gradually moved to the earthquake. Sampling the corpus from 12 pm to 6 pm each day, Figure 3-3 reveal the flow of the topic more or less. A further search for keyword list "movie", "film", and "party" see a declining trend in sum. Particularly, Downtown has much fewer messages. As for "rescue", "bridge", and "road", they show an outburst after the second quake.
Figure 3-3 The process that HSS began to help people in the city.
Due to collapsed buildings, broken sewer with contaminated water, and trembles brought by the earthquake, many roads and bridges suffer from deep cracks and scattered bricks and require repairment, which largely aggravates the burden of traffic, causing difficulties for ordinary commuting. 1259 posts mention "road" or "bridge" (see Figure 3-4). High-repost-rate messages are mostly from the official account, @DOT-StHimark, which announces the close and open of bridges and roads. @BoredInStMark writes: "Bad enough this place is stupid, now we can't even leave #IHateBridges" (Southton, 09:09 April 8). And an alarm from @RacesAshley_WaynickOctopus says "Lot of schisms and holes in the road! Watch your step!" (Broadview, 17:35 April 8).
Figure 3-4 Distribution of messages with the keyword "bridge" or "road".
Since the earthquake, people remain highly concerned about the exact fatalities and injuries in the event, with a total of 484 posts (see Figure 3-5). In most cases, they cast doubts on the number given by the news based on what their friends told them. For example, "David says 500 fatalities. News says 300. What? #DavidCan" (@FleetLewisBreadm Scenic Vista, 04-09 09:52)
Figure 3-5 Distribution of messages with the keyword "fatalities" or "injuries".
Another clear division of the earthquake is that panic people are starting to stock food and other life necessities, including diapers, medicine, and so on (see Figure 3-6). Typical posts are like "Trying to get stocked up on water, advil and lettuce before they run out!" (@Connie1954B Northwest 04-08 16:13).
Figure 3-6 Spatial-temporal distribution of messages with the keyword "stock".
People are relying on the mobile app to call for support. "#Rumble" is the hottest hashtag during the five days, with 501 posts (see Figure 3-7). It refers to a local application for crowd sourcing damage reports, which has become a life necessity. City EOC Public Information Officer are appealing citizens to use Rumble for all other less life-threatening damage." More and more people are downloading this application and discussing its utility. As @KRAKTV points out, people are downloading the Rumble App at a record pace and such willingness of people to help report damage is praised by the officials. And people are content with the app, saying that "The rumble app is marvelous! Definitely worth getting!" (@Kenneth_WagnerBird39, Broadview, 04-09 17:03)
Figure 3-7 Overall distribution of messages with the keyword "rumble" and relevant high-frequency words.
People are helping others to get out of the disaster. Knowledgeable experts take public responsibility through joining HSS. The word "HSS", which refers to the Himark Science Society, has become a hot word in the late stage (see Figure 3-8). From the message of @DarkCandyWm_Wood and @CandidALight, HSS monitors radiation levels around the nuclear power plant and aids the full-time crew with security and clean up. On April 9, almost 400 messages about HSS were posted, most of which is praise from the public. Like what @Alston1978 says, "Excited about the our city/HSS team-up." Additionally, celebrities are also participating in supporting victims. Lacki Dasical, a famous singer, visited the survivors at the disaster shelter and raised people's attention to the disaster shelters.
Figure 3-8 Discoveries of people self-organizing standby services.

4The data for this challenge can be analyzed either as a static collection or as a dynamic stream of data, as it would occur in a real emergency.  Describe how you analyzed the data - as a static collection or a stream.  How do you think this choice affected your analysis? Limit your response to 200 words and 3 images.

Figure 4-1. The interface of our visual analytics system. (a) Wordstream View, revealing the topics evolution of posts by users of Y*INT. (b) Timeline View, showing keywords distribution with time. (c) Map View, showing keywords distribution in different regions. (d) Word Cloud, displaying keywords of a region. (e) Message View, listing messages containing the keyword interested.
We took the data as a static collection. Taking the data as a static collection can give us a quick overview of the events on social media and reduce the design burden for unexpected behaviors in stream data. With the overview, users can explore details of topics in different regions at different time. Taking the data as a dynamic stream requires the system to highlight abnormal behaviors when new messages appear. The views of our system can support streaming data with improvement.