Dynamic Citation Graph

Almost a year ago I had a post about citation networks of interconnected communities. Simply speaking, I displayed a citation network (which paper cites which) of two related conferences. As conferences pairs I picked GD/WG, ICALP/ESA, and SWAT/WADS. All these conferences had their proceedings published by Springer, which made it easier to get the data via crawling through the springerlink portal.

Although the pictures were really nice (rendered in gephi) I wanted to have a more dynamic visualization. This weekend I finally found the time to prepare something. I picked the biggest component (82 nodes) in the SWAT/WADS network, since it was the smallest interesting (sub)graph I found. My goal was to let the user filter the data by year and get a dynamically increasing citation graph over time.

There are many tools for preparing an interactive visualization on a webpage. I wanted to make it in  javascript - but still - which weapon should I choose: d3.js, sigma.js, processing.js, cytoscape.js, or something else? Sigma.js and cytoscape.js are targeted for displaying graphs. However what I saw from the examples I liked d3.js most (I also prefer vector svg graphics over the the rendered canvas you get from sigma.js). So I came up with picking the very powerful d3.js library (data driven documents), developed by Mike Bostock who did his PhD at the Stanford Vis group and now works for the New York Times.

The power and the elegance of d3.js comes not for free. The way how to link object on the webpage with data was very different to everything I knew before. Especially how the data is processed is somehow unconventional. So its rather rough to get into this. On the positive side, there are many beautiful examples on the web, from which I could learn a lot. On the negative side, the documentation is rather sparse. So I started with a few examples (force directed layout, modified force directed layout, a slider in d3.js) and learned everything I needed along the way.

But first I had to get the data into the right form. The old data was stored in the gephi format. To make it readable for javascript/d3.js I had to convert it to the JSON format. I also wanted to clean the data and to attach some information. In particular, I wanted to add the time when a node started to have links. This is either its publication year, if it cites a paper in the big component, or the first time the paper gets cited by another paper from the big component. To get the job done I used the networkx package. This worked like a charm. I haven't used networkx before and can recommend it. Superb design.

After the data was prepared I started to modify the examples. The force-directed layout engine included in d3.js produces great results. Also algorithmically it's very interesting. It uses a quadtree and the Barnes-Hut approxiamtion to get the runtime from O(n^2) to O(n \log n). This is something that I have not seen in other implementations of force-directed methods so far.

Okay, so here is the end of the story. You can find the source code here and the data in this json file. Use this link for viewing outside of the blog post. You get the title and the publication year of a paper as a tooltip.

4 thoughts on “Dynamic Citation Graph

  1. Permalink  ⋅ Reply

    Graeme

    December 16, 2016 at 12:14pm

    Love this! I'm trying to recreate it - would it be possible to see the json so I can look at how the input data are structured and reproduce the example?

    • Permalink  ⋅ Reply

      Andre

      December 16, 2016 at 12:25pm

      I added a link to the json file in the post.

  2. Permalink  ⋅ Reply

    Phuc Do

    April 23, 2017 at 1:52pm

    Could you please add your liink to source code "here". I want to enjoy your achivevement.

    • Permalink  ⋅ Reply

      Andre

      June 26, 2017 at 12:27pm

      The source code is already linked. Try a right-clicking on the link (depending on the browser).

Leave a Reply