27 February 2018

Learning about set theory

I recently had a problem thinking up a solution for a tricky database query. I'll use the common movie database paradigm to demonstrate it.

A registered user is able to store their preferred genres, so when they search for a movie by preferred genre, they can filter out the movies that don't appear in their preferred genres. As you know, you can have more than one preferred genre, and movies can be multiple genres, for example, romantic comedy. The requirement here, for a user that likes action, comedy, horror, and sci-fi, is not to see romantic comedies in their results because although they like comedies, they haven't specified romantic films on their list. If they wanted to see romantic comedies on their list they would have to also add the genre romantic to the list.

Ok, so the requirements are done with. How do we implement it. Well I got quite stuck to be honest. I started searching, but found it difficult to find a pre-written query for just what I needed. After Googleing "how to think in SQL", I then went on to explore set theory, which I remember doing something with in school, but didn't pair with databases, as people never really even had computers in their home during this time. So I didn't even know what a database was.

Doing some searching, I found this great YouTube tutorial on set theory. During it, I realised I could perhaps get what I needed by subtracting one set from another. Let's look at the tables first though.

[movies]---<[movies_genres]>---[genres]---<[users_genres]>---[users]

In our normalised table structure, we are comparing the foreign key genres_id found in pivot tables movies_genres and users_genres.

Looking at figure 1, if we imagine one movie and one user, you can see we cannot subtract the set movies_genres.genres_id from users_genres.genres_id because the genres_id 3 is left over in movies_genres.genres_id. So movies_genres.genres_id is not a subset of users_genres.genres_id.

Example: Figure 1.

movies_genres.genres_id {1,2,3} 
users_genres.genres_id  {1,2}

In figure 2, we can subtract the set movies_genres.genres_id from users_genres.genres_id because all numbers appear in movies_genres.genres_id that appear in users_genres.genres_id. The 3 you can see in users_genres.genres_id is fine as we don't need to exactly match the query. Here movies_genres.genres_id is a subset of users_genres.genres_id because all numbers appear in the second set.

Example: Figure 2.

movies_genres.genres_id {1,2} 
users_genres.genres_id  {1,2,3}

So how do you test if one set is a subset of another set in MySQL? I've used the in() operator. I'll quote W3Schools for the explanation.

"The IN operator allows you to specify multiple values in a WHERE clause. The IN operator is a shorthand for multiple OR conditions."

To use the in() operator for subtraction, I negate it with not in(). This tests whether there are values left over when comparing genres_id between movies_genres and users_genres (see Figure 1.). This is essentially the subset check. When we can test to see which queries have failed the subset check, we can negate the results to select the movies we want.

So with that, this is how I do it in MySQL.

select * from movies 
where movies.id not in (
    select movies_id from movies_genres mg
    where mg.genres_id not in (
        select genres_id from users_genres ug 
            where ug.users_id = 2
    )
)

Conclusion

I hope this is correct. I am quite inexperienced with SQL and never really learned it in an academic environment. I hope to solve more difficult MySQL problems with set theory in the future.

No comments:

Post a Comment