13 December, 2007

How to create a tag cloud? (With formula and sample calculation)

I googled on how to create a tag cloud. I found some, but, I didn't like their way of doing it because I think they did it the improper way. That's why I wrote this blog so that it's my turn to post something educational.

But before anything else, what is a cloud tag? Let me define it in my own words. Visually, it is a group of terms displayed with varying font sizes that are packed together so that it resembles a cumulus cloud. It is usually arranged alphabetically and center-aligned. Some tag clouds also have varying colors. In HTML, each tag is usually a hyper link. Conceptually, each tag isn't just a mere term; a tag in a cloud tag is a representation of an idea, a concept, or something that can be weighted; so, a bigger tag means a greater value or interest. (For example, the flicker tag cloud: http://www.flickr.com/photos/tags/ )

Now the question is how. How are the sizes of tags made vary? Simple. In HTML, just use the CSS font-size attribute.

Example:
<_a href="http://www.blogger.com/mylink"> tag item <_/a>

Look at the example above. If that looks strange to you, then stop reading right now and go away because you're not my target reader.

If you're still reading, then you know that that's an HTML tag for a link.

To have a tag cloud, you need many tags but with varying font sizes among them. That's easy, isn't it? But the hard part is generating those tags dynamically and computing the right size for the right tag.

What you need is a database of tags. Then query your database so that you have with you the list of tags and their number of occurrences. See the following table for example.

tags | occurrences
----------------------------------------------------------
birthday | 144
christmas | 108
valentines | 211
thanksgiving | 168
liberation | 88
halo ween | 114
new year | 140

The above table is our sample data. Each tag represents your customers favorite holiday. How can you present the tags as a cloud tag being the valentines day as the biggest (with 50px font-size) and the liberation day as the smallest (with 12px font-size)?

We will use the following variables, namely:
a = the smallest count (or occurrence).
b = the count of the tag being computed.
c = the largest count.
w = the smallest font-size.
x = the font-size for the tag. It is the unknown.
y = the largest font-size.


Now let's substitute the given values to their respective variables. Assuming that we are solving for the "thanksgiving" font-size.
a = 88
b = 168
c = 211
w = 12
x = ?
y = 50

And here's the formula:

x = (b-a) (y-w)
----------- + w
(c-a)

Or to put it in one liner (using c-like syntax):

x = ( ((b-a) * (y-w)) / (c-a) ) + w

And that's it. That's the formula. You might be wondering where I get that formula. Well, it's hard to explain here in words but let me still try. Using the "ratio and proportion" in Mathematics, the ratio of the distance between a and b and the distance between a and c is equated with the distance between w and x and the distance between w and y.

Or to make it simple,

b-a x-w
----- = -----
c-a y-w

Let's now continue computing the font-size for the thanksgiving. By substituting the values to the equation above, we will have...

x = ( ((168-88) * (50-12)) / (211-88) ) + 12
x = 36.715446
x = 37

The thanksgiving tag should have 37px font-size in the tag cloud. Try computing for the rest of the tag. You will get:
birthday = 29px
christmas = 18px
valentines = 50px
thanksgiving = 37px
liberation = 12px
halo ween = 20px
new year = 28px

--End

tip: When using Java, operate on float data type, not integer.

Labels:

9 Comments:

At 31 July, 2008 12:11, Blogger Dev said...

Good one.I usually use this formula for tag cloud.It gives good controll and linear proportion on font-size vs occurance.

 
At 01 August, 2008 03:07, Blogger Kevin said...

Thanks for the article. It should be mentioned, however, that for most purposes a logarithmic (rather than linear) size relationship between the tags works better. Just google tag cloud logarithmic and you can see some examples.

 
At 01 August, 2008 12:04, Blogger Dev said...

Adding to Kevin's comment logarithmic is good one and at the same time exponential gives reverse effect of logarithmic.

 
At 07 August, 2008 01:02, Anonymous Anonymous said...

hmm.. stumbled by accident here...
but nevertheless it's a good formula. I use it for color instead of font size and not just shades of a base color.. i mean from green to red ^^

 
At 08 September, 2008 21:44, Anonymous HitTheSpoof said...

Fantastic, thank you.

 
At 08 January, 2009 20:47, Blogger lucky said...

thats greate that you write and simple but you forget for goal how to post that label in div

 
At 11 December, 2009 20:26, Blogger Vivek Sharma said...

Thanks it works for me. But i also want a formula to compute the Max and Min Font Size according to Total Words length.
Ex. if only 5 words then screen should not be blank. if 100 words then screen should try to adjust as maximum as it can.

 
At 11 December, 2009 20:26, Blogger Vivek Sharma said...

Thanks it works for me. But i also want a formula to compute the Max and Min Font Size according to Total Words length.
Ex. if only 5 words then screen should not be blank. if 100 words then screen should try to adjust as maximum as it can.

 
At 12 July, 2011 10:41, Blogger Solved said...

Thank you man

 

Post a Comment

<< Home