A history of the world in 100 seconds

2 February 2011

Many Wikipedia articles are tagged with geographic coordinates. Many have references to historic events. Cross referencing these two subsets and plotting them year on year adds up to a dynamic visualization of Wikipedia's view of world history.

The 'spotlight' is an overlay on the video that tries to keep about 90% of the datapoints within the bright area. It takes a moving average of all the latitudes and longitudes over the past 50 or so years and centres on the mean coordinate. I love the way it opens up, first resembling medieval maps of "The World" which included only Europe and some of Asia, then encompassing "The New World" and finally resembling a modern map.

This is based on the thing that me and Tom Martin built at Matt Patterson's History Hackday. To make it, I built a python SAX Parser that sliced and diced an xml dump of all wikipedia articles (30Gb) and pulled out 424,000 articles with coordinates and 35,000 references to events. We managed to pair up 14,238 events with locations, and Tom wrote some Java to fiddle about with the coordinates and output frames. I've hacked around some more to add animation, because, you know, why not?

The code is here.

Datasets:

I've uploaded the data into Google Fusion tables in case anyone else wants to have a look.

Here's every wikipedia page tagged with a latitude and longitude, a total of 424,171 articles.

You can get a great idea of the static distribution of coords in this heat map plot.

Here are the 14,238 geo-tagged events dragged out of Wikipedia. Many of the events are attributed to a particular year or month, but the day is not specified. I defaulted them to 0 to indicate this. Hopefully it won't matter too much from the macro-perspective at which this data should be viewed.

And here are 5765 invalid coordinates that cropped up. The majority of these are actually invalid, although some valid ones slipped through the net. If I come back to this project, I'll polish up the parser to handle the final few. In the meantime, anyone looking to improve Wikipedia's data could have a go at correcting the bad data referenced here.

Comments

1 gc says...

this is shit

Posted at 10:25 p.m. on February 24, 2011

2 Lawrie says...

This is "the shit" - as in, really good.

Posted at 2:18 p.m. on March 17, 2011

3 Dario says...

Neat stuff, two comments:

  • how do you explain this extremely high density of articles in the English Wikipedia about Poland?

  • it took me a while to see that you have the year displayed in the bottom right corner, as it hides behind the video controllers.

Posted at 6:03 a.m. on March 19, 2011

4 Gareth says...

Hi Dario,

  • I have no idea! I can only assume that someone or some group of people in Poland has done a very good job gathering coordinate data for their country and taken the time to add it. Maybe there's a more interesting explanation, but my code just trawled the dumps for anything that looked like a geo-coordinate so I don't know the provenance of any particular bit of data.

  • Good point. Something to address in version 2 if there ever is one.

Posted at 10:20 a.m. on March 19, 2011

5 Daniel says...

Hi Gareth,

could it be that the event distribution is slightly biased due to the limitation to the English-language wikipedia? The heatmap shows a strong focus on UK/US events which seems to support this. Maybe it is possible to adapt the algorithm and run it over all wikipedias or even match coordinates accross different languages?

Best

Daniel

Posted at 12:41 p.m. on March 21, 2011

6 Owen Blacker says...

I would imagine the good coverage of Polish history would be (at least in part) down to the size of the Polish diaspora in English-speaking countries and the quality of English-teaching in Central and Eastern Europe.

A good indication of the number of people who have at least some knowledge of or sympathy for Polish history would be the naming dispute about Gda?sk (see http://en.wikipedia.org/wiki/Talk:Gdansk/Vote), which was particularly well-argued by a lot of people and now forms the basis for resolving most English Wikipedia naming disputes regarding cities in Central Europe (most of which are known by more than one language in modern English).

Posted at 5:57 a.m. on March 22, 2011

7 Owen Blacker says...

Bah. Obviously that was meant to be Gdansk, but I'd put the acute-accent on the N and apparently high-bit characters get stripped here. Sorry about that :o)

Posted at 5:58 a.m. on March 22, 2011

8 tomh says...

On the why Poland has so many articles question; my guess would be on the automatic generation of articles from a database.

E.g. http://en.wikipedia.org/wiki/Morowo

Was created by one bot; then edited by 3 more. Systematic creation of articles can go a long way to increasing the density of geo-tagged locations very quickly.

Posted at 10:24 a.m. on March 22, 2011

9 thruflo says...

Is it just me or would this be much more interesting with a map layer in the video? By the end you can pretty much work out the projection but until then you're just guessing it's a 'standard' positioning and you have no idea exactly where those BC dots are - Europe? Middle East?

Posted at 10:33 a.m. on March 22, 2011

10 Willy says...

The video is the absolute worst piece of shit I've ever seen. Did you mean to put a map of the world in there, or was it just supposed to be pretty little lights going off in the upper third of the frame?

This is data visualization at its absolute worst. Visualization, not for the sake of conveying information, but for the sake of making pretty pictures. What a laughable piece of crap.

Posted at 7:26 p.m. on March 25, 2011

11 Markdown says...

If you are too stupid to know world geography then GTFO. It's most interesting to see how it changes over time. Especially 1700+ Good visualization.

Posted at 4:15 a.m. on March 26, 2011

12 Markdown says...

Or if you knew anything about recorded history you would know that around 500BC almost all the events are going to be centered in the mediterranean.

Again, good visualization; don't cater to the lowest denominator and don't assume that everyone is an idiot.

Posted at 4:17 a.m. on March 26, 2011

13 eloriane says...

Well, 500BC is also right in the middle of Imperial China, for example. I'm pretty sure that counts as recorded history. It only seems obvious to have events in that time period clustered around the Mediterranean if you approach history with a Eurocentric point of view-- which Wikipedia, apparently, does.

Posted at 1:47 p.m. on March 26, 2011

14 NO N O says...

Wasted 10 seconds of my life.

Posted at 8:21 a.m. on March 27, 2011

15 Tagishsimon says...

The vast majority of your supposed invalid coordinates are articles on people with an "Infobox person" within which is a commented out and empty coordinate statement associated with the parameter: resting_place_coordinates. Which is to say that, here, the fault lies in your parser, not with wikipedia coordinates.

Posted at 5:48 a.m. on March 28, 2011

16 Gareth says...

@Tagishsimon Maybe "invalid" is the wrong word to use, but it is the case that data is missing, and should be supplied. I think that's a useful thing to point out.

Posted at 7:38 a.m. on March 28, 2011

17 hapa says...

with all the dataset's known problems & biases, i resent this being described as the history of the world. i actually got angry watching asian and pre-columbian western history play second fiddle to the growth of polish infrastructure. from the attention you got i thought you had something good to show.

i don't think it would have been hard to use sampling to correct for real population, and since you weren't ranking events by their influence it wouldn't have been wrong, either.

Posted at 1:40 p.m. on March 28, 2011

18 Gareth says...

@hapa There's a reason it's called "A history of the world", not "The history of the world". We took the data as it was and presented it in a straightforward way.

Please take the data and the code and make another history of the world. It's all out there for the taking.

Posted at 2:31 p.m. on March 28, 2011

19 David says...

Thanks for making this data available to the public. Here is my mashup - an attempt to make a timeline/map based on the data. http://www.boxshapedworld.com/timeline/protovistimeline.htm and the accompanying blog post http://www.boxshapedworld.com/Blog/post/121. There are still a few problems (between 400bc and 1000 ad) for some reason that I haven't figured out yet.

Thanks again for your efforts.

Posted at 4:36 p.m. on March 29, 2011

20 James Hardy says...

Do other language wikipedias use compatible templates? If so it would be fascinating to see this repeated for other languages, especially Chinese which is a fairly sizeable encyclopaedia but sorely under represented on this map.

Posted at 7:41 a.m. on March 30, 2011

21 Martin says...

this is really cool. I agree with other comments that it would be interesting in a second step,to look atdifferent langaueage versions (spanish might bring up more data from south america maybe) - and maybe color-code it to see the diffferent view of langauges.

Posted at 5:18 a.m. on April 1, 2011

22 artjhxhjt says...

What this video shows is that just because you can do something it doesn't mean you should bother.

Posted at 8:26 a.m. on April 8, 2011

23 Stephen Horner says...

Nice work Gareth and Tom, very surprised by some of the negativity in the comment thread here particularly given you've opensourced the code (which shows you've given thought to the localization of world views if people bothered to read) and provided the dataset!

There are so many ways this could be extended, was there any category data associated to the events in Wikipedia? If so adding a layer of colour coding to indicate conflicts, social change, natural disasters, scientific breakthroughs etc would be a great enhancement.

A hack day very well spent!

Posted at 5:17 a.m. on April 11, 2011

24 KZ says...

Neat stuff. Those that had negative comments simply missed the point of this video, 'nuff said.

But the point on 500BC being the middle of imperial China's true: annual annals existed some 300 years before that, and recorded history some 1000 years. Chinese do love to boast about their "5000-year old culture"; a culture that's fast fading away, that is.

Perhaps a more fitting title would be "en.wikipedia.org's geo-chronological data's history of the world in 100 seconds", haha.

Posted at 11:54 p.m. on April 21, 2011

25 Luke says...

Nice. It reminded me of http://pinktentacle.com/2010/08/animated-map-of-nuclear-e...

One thing for a later version, perhaps some audio (similar to above)? Perhaps a different tone for Europe, Africa, Asia etc.

A zoomed-in version of Europe could also be quite interesting.

Posted at 9:18 a.m. on May 23, 2011

26 fatma says...

awesome..well done job..thnx

Posted at 4:12 p.m. on July 14, 2011

Comments are closed.

Comments have been closed for this post.