28.09.2008

Tag Similarity

Finding an interesting topic to write sometimes easy sometimes not. This post is related with my task for Video portal in turkey.

In this post i'll discuss how to find related items from their tags. My graduation project topic was about web mining for this reason the implementation did not take much time.

Let's begin with database design.

In our db diagram you can easily see that there is no relationship with VideoTag and Video.
In Video table the tags of the video are saved like 'Platform,First Person Shooter'
like this. ',' is the seperator of the tags in Video table.


To find out similarity of each video to others i used cosine theorem.
Cosine Theorem says that the angle of the points to the origin in space is the similarity of the the points.What does point mean ? Point is the representation of the video from it's tags. Each tag is a point in a space.



Human brain accepts maximum three dimensional space so i used that picture to show up what i'm going to do. You can easily understand each video is in a space that the dimension is the count of VideoTag table.

cos(θ) = [0-1] so multiplying the result of cos(θ) with 100 is the percentage of how video's are related.
The formula is cos(θ) = A X B / (|A|.|B|)

Is there a problem up to here ? Ok then let's have a look at my code.

I've used Entity Framework as an ORM tool especially i did not want to code entities and their dao. Using LINQ is fancy for me.



cos(θ) function


From looking at the picture you can ask yourself what is videoTagMatrix. VideoTagMatrix is represents the videos in same space by all tags. Look at the picture below it shows everything


From the picture you can understand video1 tags attribute is Tag1,Tag2,Tag3,Tag6,Tag8


Here is the result




The piece of code says that I want 2. video's related Videos ( C# uses zero based array 1 indicates second)

Download Source Code


If there is any question mark for this post you can mail to me :)

Bizilye : )

Hiç yorum yok: