Two-variables Zipf distribution

Harvard linguist George Kingsley Zipf identified a distribution of how frequent words are in a corpus. Most of you should be familiar with that law in the social realm: given a “relation” criterion, most social networks lead to a zipfian distribution of degree, the number of contacts.

Looking into a database of mobile calls (frequent & reciprocal to filter out weak ties; so far the database is confidential information, but I have access to a hashed version) I came across a similar law, but about the cross-distribution of degrees at the two ends of relations.

Formally:

  • let f (n) be the zipfian distribution of degrees:

f (n) = 1/z(a) n^(- a)

where

  • z(.) is the Rieman’s zeta function; and
  • a is a parameter, generally between two and three.
  • let f (n, m) be the distribution of ties between users of degree n and m; it seems that:

log f (n, m) = – log f (n) . log f (m)

This is the simplest way to put it; the actual formula for f (n, m) is therefore:

f (n, m) = exp (- [log z(a) + a log n] . [log z(a) + a log m] )

Actually, f (1, 1) is much lower then expected (but still higher then most other case).

What I need now it to check or have specialists check such a law on as many complex graph as possible: social, assortative ones might boast the same result. This result appears more precise then over-all assortativity. I am assuming it wouldn’t be true on bipartite-based networks, as f (n, ñ) — with n and ñ close — would then be higher.

Comments are more then welcome.

Advertisements

About Bertil

I'm a PhD student in Digital Economics, and I love viennoiserie. Je suis un doctorant en économie (numérique) et j'aime la viennoiserie.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Two-variables Zipf distribution

  1. Jason says:

    I’m a bit confusticated. I guess it’s been too long since I’ve looked at this type of mathematics. 🙂

    Is this saying that in the data set you are seeing a zipfian distribution across all social connections? Where the majority of traffic is between two people in the entire dataset?

    Or is it that the strength of the social connection tends to be bidrectional and relatively equal? In other words, that you are likely to see the same amount of traffic in either direction of a social connection?

  2. Bertil says:

    Hi Jason
    Thanks to be the first one to comment on my blog! Now I know how WordPress handles it.
    What I am saying is that:
    1. Zipf law is also true for the number of contacts, the social degree: some people are connected to many more people then others (politician, journalists, etc.)
    2. Better connected people tend to know mostly well connected people; that is called “assortativity”.
    3. I have a formula to represent it, that works on close ties (mobile calls).

    1 is trivial; 2 has been known for several years; 3 is new (and I realize since, not entirely true for other data sets).

    One point that might help you: f(n,m) is a distribution of *relations*: how many connexions are they between n-degree and m-degree people.

    Hope this makes things clearer.

  3. Pingback: A simpler formula « Two Croissants

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s