Wednesday 25 April 2007

Blogophysics


I was reading the files on arXiv this week and found the paper:

Cascading Behavior in Large Blog Graphs
Jure Leskovec, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst
arXiv:0704.2803v1 [physics.soc-ph]

I remember that I have already seen a paper about blogs before and I am almost sure that Osame wrote about that in his blog, so I made a little search on arXiv and found these ones, in reverse chronological order:

Social Information Processing in Social News Aggregation
Kristina Lerman
arXiv:cs/0703087v2 [cs.CY]

Information propagation and collective consensus in blogosphere: a game-theoretical approach
Lianghuan Liu, Feng Fu, Long Wang
arXiv:physics/0701316v1 [physics.soc-ph]

Social Networks and Social Information Filtering on Digg
Kristina Lerman
arXiv:cs/0612046v1 [cs.HC]

Social Browsing on Flickr
Kristina Lerman, Laurie Jones
arXiv:cs/0612047v1 [cs.HC]

The structure of self-organized blogosphere
Feng Fu, Lianghuan Liu, Kai Yang, Long Wang
arXiv:math/0607361v3 [math.ST]

Quantitive and sociological analysis of blog networks
Wiktor Bachnik, Stanislaw Szymczyk, Piotr Leszczynski, Rafal Podsiadlo, Ewa Rymszewicz, Lukasz Kurylo, Danuta Makowiec, Beata Bykowska
arXiv:physics/0506051v1 [physics.soc-ph]

It seems that Kristina Lerman, a physicist who became a computer scientist, is the most active in this area. Take a look at her homepage.

Basically, blogs are considered as vertices of a graph and links between blogs as its edges. Well, this kind of network has been studied in physics for some time now. There are a lot of things you can study, but I will highlight just the most common one. Usually, what is studied is the distribution of edges in the graph, i.e., the number of vertices with 1,2,3,... edges. The interesting result, which appears in a lot of different networks in nature, from the famous small world networks to chemical networks in the cells, is that this distribution obeys what is called a power-law:


where r is the number of edges and k is a constant. This kind of formula is important in physics, because it indicates that the characteristics of the distribution do not change when the scale of the problem changes, i.e., if we analyse 10, 100, 1000, 10000 blogs, the distribution will be the same (dicarding finite size effects). We call this distributions scale-free. To have a feeling of the importance of these power-laws, they appear in second-order phase transitions and in self-organised criticality (which is not unrelated).

All this again falls in the huge multidisciplinary range of complex systems, which is highly fashionable (specially if you want to ask for a research grant) but difficult to define. Although we could say that complex systems are systems composed by a large number of interacting units which can manifest some kind of emergent behaviour, maybe it would be simpler to say that complex systems are all those which are not simple.

In the beginning of this blog, I wrote a post about avalanches. This is considered a complex system with emergent behaviour: the avalanches. The real ones don't, but simplified ones in computer models have a distribution of sizes which is a power-law. These relationships are still not well understood, but they indicate connections between a lot of phenomena and we hope this will be clarified during this century (well, at least I hope...).

Picture: from Data Mining. The original caption is:
This graph shows another view of the core. Rather than require reciprocal links, I have simply pulled out the largest connected component formed by any directional link between blogs. The obvious insight here is the relationship between LiveJournal (blue) and the rest of the core.

No comments:

Post a Comment