Mini Challenge 1: Wiki Editors

Authors and Affiliations:

Jason Payne, Palantir Technologies, [PRIMARY contact]

Ravi Sankar, Palantir Technologies

Dinesh Shenoy, Palantir Technologies

Jake Solomon, Palantir Technologies

Student Team: NO


For the VAST competition, the analyses were performed primarily in the Palantir Government platform and to a lesser extent in GoogleEarth and the Palantir Finance platform. Both Palantir platforms are being developed by Palantir Technologies, based in Palo Alto, California. Palantir Technologies was founded in 2004 and works with customers across the Intelligence and Finance Communities.

The development team at Palantir made the decision early in the company’s history to develop an analytic platform based on a foundation of openness; a trait not often seen in the intelligence community. As old institutions transition into a world where information is increasingly a commodity, the archaic paradigms of locking down knowledge are giving way to an environment where analysis is the real power. Palantir Technologies is able to liberate this power in several concrete ways: The first is data integration - whether structured or unstructured, Palantir provides standard and extensible interfaces for bringing information into a common environment. The second is Search and Discovery, whereby these disparate data stores can be explored as though they were one. The third is Knowledge Management in which all the knowledge that is discovered is treated like another data source so no analysis is lost. And finally, the fourth is Collaboration whereby many analysts working together can truly leverage their collective mind. Through our open APIs and numerous (and multiplying) extensibility points, Palantir has succeeded in creating a genuine platform for application-development and information-analysis.

Wiki-1: What are the factions represented in the edit pages and who are its members? In other words, describe the groups and their members based on their editing changes.

Clearly partisan in favor of the Paraisos Amado, VictoriaV, RyogaNica, Socorro
Clearly partisan against the Paraisos Agustin, Rm99, DailosTamanca, DavidMorón, Hinzel
Neutral / fair, perhaps with a slight intellectual bias against the Paraisos Edemir, 66.66.125.x, Sara
Truly neutral (moderators, grammar-focused editors, bots, etc.) Adriano, Bakbot, Salvatora, Ricarda
Vandals (all anti-Paraiso) Cristofer, Alejo, Absalon, Alphanzo, Molotover, Alejandrosanchez 67.55.3.x, 66.175.135.x, 86.151.194.x, 84.158.202.x, 68.60.74.x, 128.125.81.x, 69.14.85.x, 195.113.65.x, 131.174.244.x, 74.120.3.x, 209.155.27.x, 75.179.21.x, 204.52.215.x, 75.81.8.x, 24.168.142.x, 71.59.210.x, 201.226.51.x,

Video link:

   Wiki Video

Detailed Answer:

            Wikipedia edits are a great example of the increasingly complex style of datasets that we are required to analyze today: they are extremely large, have a low signal-to-noise ratio, and rely primarily on human intuition to extract knowledge. In this environment, information is plentiful and analysis is scarce. Computers are facilitators, not agents, of analysis. Palantir, therefore, focuses on enabling humans to ask high-level questions about their datasets and having the computer responsively display the answers.

            To start this investigation, we imported the edits and discussions as events and documents, with each linked to a username and date. The Palantir Dynamic Ontology allows us to define all of the objects, events, properties, and links we need to model any data set, meaning that an end-user can modify Palantir’s ontology without code changes to accept almost any type of data.

1-data import

Figure 1: Data Import


Below we see all of the edits on the Palantir Graph, with the username of the wiki users in a histogram on the right (ranked by number of edit performed) and a timeline of those events:

all edits grid

Figure 2: All the “Wiki Edits” gridded in our Graph Explorer


We analyzed the edits in two ways:  time centric and user centric. We divided our team of 4 analysts into 2 ‘cells’ with different approaches.  The time centric team noticed periodic spikes of activity, probably representing controversial edits of a certain segment or “flame wars” (figure 3).

Figure 3: The Timeline

We decided to zoom in on these segments of the discussion and disregard the more quiet segments. We lined the edits up sequentially and used Palantir to link them by related entities, allowing us to easily see who was responsible for which posts (figure 4).


Figure 4: Rm99, Agustin, and VictoriaV battling it out


We then used the Browser view to examine the details of each post in order to identify potential factions. Although these were sometimes readily determinable based on the contents of the post, often the flame wars consisted of little more than an initial comment and multiple people posting “rv [revert]—vandalism” in an attempt to restore their favored version of the text (figure 5).



Figure 5: A revert war from the original file of edits


We, thus, had a visual representation of rivalries (such as that of Rm99 and VictoriaV), but we sometimes needed more context on the users involved to give those rivalries meaning.

            To find this context, we turned to our second cell, which had been operating simultaneously. Palantir’s collaboration capabilities allow multiple individuals or teams to freely manipulate a dataset knowing that they will not corrupt the original version, as each user operates in their own virtual private repository (“VPR”). At the same time, any insights that one group desires to share with other investigators can be easily published to the base repository available to all analysts (similar to a CVS or SVN model). This team was tasked with an alternate approach: a user-centric analysis of the edits. By selecting all edits in the graph, we quickly determined the top users from the Histogram (figure 6.1).


histogram top users

Figure 6.1: The Histogram


link by

Figure 6.2: Just click and Palantir will find any entities linked to the selected edits



            all top players

Figure 6.3: Top users linked to their edits and discussions


We linked blocks of edits together by user and removed the rest from view (removing-from-graph in Palantir is also non-destructive [only affecting the user’s VPR until changes are published]) (figure 6.2). We then added events of type “Wiki Discussion” to the graph and had Palantir link them to their owner (figure 6.3).The majority of talk page entries were tied to minor players in the discussion and were discarded. Viewing all posts of a given user made the faction divisions mostly clear. Even when we couldn’t tell directly from the content, we synthesized rivalries discovered by team one with known entities from team two to infer the unknown faction allegiance. For example, we read Agustin saying “Catalano’s religion is… a hedonistic religion… of misogyny and greed” during our user-centric analysis and noticed RyogaNica getting in “revert-wars” with Agustin during our time-centric analysis. We also noticed RyogaNica getting in a fight with Edemir (an ambiguous character) and supporting Amado (a figure in favor of the movement). Thus, we can say with high reliability that RyogaNica supports the Paraiso movement.

            Using this workflow model, we broke the top players (more than five edits) into several factions. The two obvious categories were strong supporters/opponents of the movement. We also found that several editors didn’t fit easily on either side, so we created a neutral category. We further sub-divided the neutral category because several posters made comments on both sides of the issue or were posting fair information in opposition to the movement. We did not want to group a fair attempt to critically evaluate the Paraiso movement with people posting “Paraiso is B******T” or even those showing consistent personal opposition to the religion. VictoriaV (pro) and Rm99 (anti) were the first two people assigned to factions because they were easy to identify. Although VictoriaV hides her partisanship behind Wikipedia rules (“NPOV pushing!”), we quickly learned to see behind this kind of mask. Agustin (anti) was the next assignment based on his discussion posts. The remaining partisans were assigned largely based on their support and opposition of those categorized before them. The anti-vandalism bot was clearly neutral, but we also tossed ambiguous cases (Edemir, 66.66.125.x, Salvatora, Sara, Adriano, and Ricarda) into our neutral bucket. Careful reading of posts led us to believe that Adriano was really focused on grammar and formatting. Salvatora is a moderator, and Ricarda never makes major changes. Edemir, 66.x, and Sara occasionally focused on grammar, but they also delved into content more often. Because they occasionally clashed with both sides but sometimes posted critical content, we labeled them “fair,” meaning they may not support the movement but are not openly biased against it either. Finally, we found all the posts by BakBOT, who automatically reverts mass deletions to see if fringe-radical opponents to the religion had anything interesting to say. We discovered one allegation that the movement had killed several health professionals, but most of the comments held little more than name-calling.

            The Palantir platform, thus, transformed a text file of Wikipedia edits and a word doc of discussions into a rich, interactive investigation that we analyzed relationally and temporally to determine the factions of the movement.

Wiki-2: Is the Paraiso movement involved in violent activities?

YES (with reservations)

List of wiki edits providing evidence

# (cur) (last) 09:52, 4 September 2006 Barfly2001 (Talk | contribs) (93,491 bytes) (?See also - {{wikinews|Belgian justice prosecutes Paraiso}})
# (cur) (last) 09:26, 4 September 2006 Angelgasperi (Talk | contribs) (93,439 bytes) (?Controversy and criticism - Belgium prosecuting, wikinews source)
# (cur) (last) 03:16, 19 September 2006 Alphanzo (Talk | contribs) m (moved Paraiso to GUNNED DOWN SIX DOCTORS AND NURSES IN COLD BLOOD)
# (cur) (last) 03:12, 19 September 2006 RyogaNica (Talk | contribs) (97,765 bytes) (?Home Health Care - POV pushing removed. again ridiculous.)
# (cur) (last) 03:09, 19 September 2006 Edemir (Talk | contribs) (97,530 bytes) (?Home Health Care - added confrontation of Paraiso members and Dept of Health)
# (cur) (last) 03:01, 19 September 2006 Sara (Talk | contribs) (97,765 bytes) (?Home Health Care)
# (cur) (last) 02:25, 19 September 2006 RyogaNica (Talk | contribs) (97,966 bytes) (Undid revision xxxxxxxxx by Edmir (talk) POV pushing)
# (cur) (last) 02:21, 19 September 2006 Edemir (Talk | contribs) (97,565 bytes) (?Home Health Care - Dept of Health intervention refs)
# (cur) (last) 11:58, 18 November 2006 Amado (Talk | contribs) (114,196 bytes) (Deleted false statement. A person can only be declared afther it has been proven in a Justicia Juicio that he commited a high crime. Intro to C. Ethics 1998 is no longer used refer to 2006 edition only)

Short Answer:

The evidence suggesting that the Paraiso movement is tied to violence is neither very reliable nor conclusive, but there are grounds for suspicion. The first hint is founder Ferdinando Catalano’s list of intellectual influences: Trotsky, Guevara, Jose Marti, and Pancho Villa. All of these individuals advocated the use of force to secure their own viewpoints, and it is hard to believe Catalano has overlooked this lesson. Moreover, several Wikipedia edits (which were quickly censored) imply criminal activities (“Belgium prosecuting”) or outright violence (“GUNNED DOWN”). Finally, the Wikipedia discussion page suggests harsh treatment of dissent as well (“Getting angry over criticism of a leader...”; “This is a group bent on... creating a theocracy in the US....”; “[Women] get... locked away if they refuse the males'...”). Since anyone can edit Wikipedia and the rest of our evidence is circumstantial, however, we can only conclude that the group may be violent.

