Counting
For no particular reason at all, I decided to dig up and study counting from scratch again this week. It took me two full days of solving problems and connecting dots before I could start penning an article that strikes deep at the heart of combinatorics!
The Fundamental Principle of Counting
Let’s imagine that we need to make a pizza and have the following options. types of crust, kinds of cheese and 5 different toppings. In how many ways can we make a pizza if we were to choose one type of each ingredient? Since we’re required to choose all three ingredients, we start by mapping each crust to the 4 cheese variants ( ways) and then map each topping to these 12 crust + cheese combinations. This gives us a total of ways to make our pizza. But what if we must either choose a cheese or a topping with our crust and not both? The total number of possible pizza combinations now reduces to . Each crust is now only mapped to either the 4 cheese types or the 5 toppings and those choices are mutually exclusive.
Multiplication is applied while counting sequential events like choosing a crust, then a cheese and finally a topping while addition combines separate events that do not occur together but contribute to the solution space (total number of overall possible events).
Ways of sampling
The number of ways we can sample (pick) objects times is simply and is called sampling with replacement. It can be thought as filling slots in a bag from a pool of objects (constant) where each slot gets choices. When a die is rolled twice, there are a total of different outcomes. This is because for both rolls, we have possible outcomes (die rolls are independent). The combined outcomes for each event are multiplied according to the fundamental principle. The number of functions possible from a domain of size and a co-domain of size is again since each element in the domain can be mapped to 6 different elements in the co-domain and is then repeated 5 times. Another example of sampling with replacement would be to count the number of character passwords that can be generated using the letters of the english alphabet and digits ( to ), with no restrictions on repetition. Since there are choices for each of the spots, it leaves us with possible passwords.
The number of ways we can sample objects without repetition would be This is also the number of ways we can arrange (or permute) objects linearly. This value is known as the factorial and is represented by
This video has a cool explanation why equals .
If our password was now put under a constraint where each character could only be chosen once, we would have choices for the first character, for the second and so on till for the fifth and final character. The total number of possible passwords would now be . This result is equivalent to divided by We call this a permutation or an arrangement of objects where the order matters (ordered sampling without replacement). Simply put, it is the total number of possible arrangements of objects taken at a time, with being the case when all objects are to be arranged ( objects taken at a time).
Now, if we are asked to find the number of possible combinations of characters (out of 36)
that would result in a 5 character password, we have to reject the permutations of the same group of
characters. For example, abcde
and badce
are two different permutations but the same combination
of letters (a, b, c, d, e
) as
we disregard order and are only concerned about the chosen items (characters in our case) themselves.
For this, we divide the total number of permutations by the factorial of the number of
objects sampled at a time, which is
This is because every set of size elements chosen from has permutations and all of those permutations make up combination. This doesn’t seem very intuitive at first, so let’s demonstrate this with an example.
When looking for combinations, we are partitioning the objects into classes that contain groups
(permutations) of the same k-element set. If we have the letters abc
, we get ()
permutations of them.
abc
, acb
, cab
, cba
, bac
and bca
If our original problem was to count the number of possible combinations of letter groups
from a group of characters abcdefghij
, abc
would be a class (of size ) that has permutations of its own.
But since these permutations do not contribute to the total number of distinct combinations, we divide
the total number of permutations by which gives us
(for each such class).
Each class (of size ) has different permutations. Thus, dividing the total number of permutations by gives us the number of classes (or combinations) since each class corresponds to one combination. This is also known as unordered sampling without replacement.
Grouping
If we want to find the number of ways we can divide people into groups of sizes , each, we divide the number of permutations of (which is ) by the factorial of each group size. The intuition behind this is exactly similar to what we discussed earlier.
This is called the multinomial coefficient. The binomial coefficient (combination formula) is a special case of the multinomial where and hence directly leads us to the relation.
This means that every time we choose what’s in a subset (of items) from items, we are also choosing what’s not in the subset ( items). The coefficient tells us that out of objects, there are possible combinations containing elements each.
It is important to note that the order of elements within each group does not matter when calculating the number of groupings. For example, suppose we need to divide 8 people into groups of sizes 2, 3, and 3. If these groups are labeled (e.g., specific teams or roles), the number of ways to assign people is given by:
However, if the two groups of size 3 are indistinguishable, meaning that swapping them does not create a new arrangement, we are overcounting. In this case, we must divide further by and the number of ways to permute the two identical-sized indistinct groups become:
This adjustment ensures that we do not count the same arrangement multiple times. The key distinction lies not in the individuals within the groups, but in whether the groups themselves are considered distinct or interchangeable based on the context. For example, if the two groups of size 3 represent two named teams, they are considered distinct and no further division is needed. But if the groups are formed randomly and are not functionally different, they are considered indistinguishable and we must account for that by dividing by the factorial of the number of identical groups.
In essence, our multinomial partition is equivalent to first choosing people out of and then choosing people out of the remaining . The last people are automatically assigned the last group, which we then divide by to account for the overcounting of the two groups of .
It is evident that the presence of identical objects (sampling with repetition) reduce the number of ways we can arrange (or group) them. This leads us to an interesting observation.
When, three dice are rolled, there are 6 possible ways we can get a sum of 9.
and another 6 ways to get a sum of 10.
Even though there are only 6 possible ways for getting both sums, a sum of 10 is more likely than a sum of 9 because the presence of identical elements in an event reduces the total number of permutations.
In both sums, there are 3 events where all the numbers are unique. This gives a total of possible arrangements for each. In the rest of the three cases, the sum 9 has two events where two numbers are the same and one event where all three numbers are the same . This gives a combined total of possible arrangements. For a sum of 10, all the remaining three events have two similar numbers (3,3,4/2,4,4/2,2,6). This gives a combined total of possible arrangements. The presence of in 9 reduces the permutation space and makes all the difference. Since there is only one way it can be arranged, the probability of getting 3 in all three die rolls is three times less than an event with only two similar numbers ( ways) and six times less than the event where all three numbers are unique ( ways). The higher the number of possible permutations, the higher the probability.
Similarity and uniqueness
Let’s pose a question. What is the number of subsets possible from a set of distinct objects? We can think of it as having the binary choice of either choosing or discarding an object from our set for each of our slots. Since there are options and objects (or slots), it gives us a total of possible subsets including the null subset where no objects are chosen. If you have used bitmasking in digital electronics or programming, this should be something you’re already familiar with.
But what happens when all items are the identical? How many subsets can we form now? The fact that the objects are now similar changes everything. The total number of ways we can now make a selection from this set will range from to or ways (pick either items, item, … all items). The reason for this is when the objects are distinct, picking two objects (say and ) is not the same as picking two other objects ( and ). But when the objects are alike, it makes no difference in which objects we pick. We are only concerned about the number of objects chosen, not their quality.
So if there are 5 chocolates of one type and 3 of another, the total number of ways we can make a selection is
The Inclusion-Exclusion Principle
This is an extremely powerful technique that allows us to find the union of sets by avoiding double counting. Let’s understand what this means. Consider two sets and that represent the number of students who like the subjects mathematics and chemistry respectively. Say set has students and set has . In both of these sets, there may be students who like both mathematics and chemistry. Now, if we were to find the total number of students who like these two subjects (not the number of students who like both, which is an intersection of the sets), we would consider any student who likes either maths or chemistry. But we cannot simply add the number of students in both sets () as these sets are not necessarily disjoint or mutually exclusive and may contain students who like both the subjects. Hence, the number of students in the union set would certainly be less than . To avoid double counting the students who fall into both sets, we subtract the number of students in the intersection set (say ) from the total sum ().
This is the very idea behind inclusion-exclusion. We include and exclude combinations of sets to take care of the overlap during counting.
If we were to extend this idea to sets, we would first add all the single sets, then subtract all the combinations of the intersection of two sets and finally add all the combinations of the intersections of three sets (of which only exists). This is because in subtracting and , we double subtract the elements in set . The same can be said for while subtracting and , and C for and .
Below is the union of four sets using inclusion-exclusion. We alternate our signs (add and subtract) on each step till we reach the final intersection of all sets.
A great example of applying this knowledge can be found when we’re asked to count the number of onto functions (functions whose range co-domain, in other words every element in the co-domain must be mapped to an element in the domain) possible from a domain of 5 elements and a co-domain of 6. To begin with, we count the total number of functions possible from these two sets, which is (as discussed earlier). We now proceed to subtract the functions that are not onto. The number of functions where one element in the co-domain is left out can be calculated as since there are ways to choose that one element to leave out. This is equivalent to summing all single sets.
Similarly, the rest of the cases (functions where 2 elements are left out and so on) can be represented as
We add and subtract all of these terms alternatively from the total number of possible functions to get the right answer.
Derangement
This is a perfect time to talk about derangement, which is the peculiar arrangement where no element is at their original slot. If we have the set , then would be an example of a derangement whereas is not. Let’s consider a set as the permutation where the i-th element is fixed in its original position. Taking an example where we apply derangement to objects, we start by calculating the total number of possible arrangements () and apply inclusion-exclusion to this value. We first calculate the sum of the single sets where one object is fixed. There are ways to choose an item and ways to arrange the rest. So we get,
But notice that these also include the cases where or more items are at their correct positions (we are double-counting). To resolve this, we subtract the cases where two objects are at their fixed positions.
But again, this also includes subtracting the cases where or more items are at their correct positions multiple times (we are over-correcting).
So we add back the case where three sets intersect.
And finally, subtract the case where all four elements are at their original slot.
We subtract these invalid permutations from to get as the answer. (Same as alternate add/subtract from total possible arrangements)
Adding and subtracting resolves the double-counting and over-correcting introduced by the previous steps respectively.
Integral solutions
We now come to what’s arguably the most tricky sampling scenario. Let’s take the case where we have a set of () elements and we need to sample elements () such that each element can be picked multiple times (sampling with replacement) and the order does not matter. The elements are identical and the slots are distinct.
Clearly, there are possible ways of making a selection. To arrive at this solution mathematically, we can represent each of those solutions as a vector of size that sums up to where the element at position represents the number of times the th item of the set has been sampled.
This is equivalent to finding the number of positive integral solutions of the equation
In general, the number of integral solutions of the equation is given by
The intuition behind this is deceptively simple. Let’s imagine the situation where we need to distribute chocolates () to () kids. One possible solution could be:
For each such solution, we always place chocolates and bars to partition them into bins (children), which gives us a total of positions. We only need to choose positions for the bars and the rest of the positions will be assigned to the chocolates (items). This gives us .
This method is equivalent to arranging (or grouping) items using the multinomial coefficient where items are of one kind and are of another.
Adding constraints
Our distribution problem can be complicated by adding a constraint on one or more variables. If we take the equation,
this becomes a problem where chocolates must be distributed to kids where the first kid must be given at least chocolate and the second kid must have at least (Same as the number of ways we can get a sum of when dice are rolled). To solve this, we need to substitute the constraint variables with and where and . This is because the least possible values of and must be 1 and 2. So the equation now becomes:
If the constraint on was , then since is the smallest possible value can take.
Let’s now consider the equation with the following constraints:
Although it might seem intimidating, our knowledge of inclusion-exclusion will allow us to solve this fairly easily.
Let’s start by taking care of the lower bounds of each variable and shift our variables accordingly.
The number of solutions of this transformed equation is equal to
We now shift our attention to the upper bounds. Since we have shifted each variable by their minimum constraint, their range has now become the following:
Hence, the maximum possible value for each variable now becomes and respectively. To apply inclusion-exclusion, we first eliminate (subtract) all the possible solutions that contain the following values
one at a time as they are all invalid. This is analogous to the sum of all the single sets
whose sum equals Following this, we move to the second step where we now consider every pair of constraint variables.
The sum of these pairs equals . Finally, the combinations of triplets and quadruplets would result in zero possible solutions.
Applying alternate summation, we get solutions.
Identical Slots
The case of identical slots is rather cumbersome. Let’s take the scenario where we have distinct books that we want to place into identical boxes so that each box can get either or more books. If the boxes were distinct, we’d simply say that there are ways of distributing and be done with it. Putting a book into box and book into box is not the same as the opposite ( into and into ) and are counted separately. But when the boxes are identical, this isn’t true because the boxes are not labelled and hence their order does not matter anymore. Since there is no way to tell the boxes apart, the two cases “collapse” into one single case.
The former can be considered as mapping functions from a domain and co-domain while the latter is a case of integer partitioning.
To solve our problem, we start by considering the distinct partitions of the books into the 4 boxes like so.
If we use grouping (multinomial coefficient) for each case, we get:
Summing all the cases gives us a total of ways.
Finally, if both our books and boxes were taken to be identical, we would just have those 9 ways of integer partitioning for our answer.
Symmetry in arrangements
In a linear arrangement, we have a specified start and end to the sequence. For instance,
abcd
and cdba
are two permutations where the first and last elements of the arrangements are obvious
and concrete. But what happens when we arrange them in a circle? The
start or end of such an arrangement reduces to perspective. So what really matters
is the relative order of the objects since a rotation of the circle would not result in a new arrangement.
In order to break this symmetry, we first place 1 element at 1 spot (1 way) and then the rest
elements in ways by taking the first item as reference.
Thus, the number of circular permutations of objects become
An interesting case of circular arrangements arise from the arrangement of beads on a necklace. Apart from the usual rotational symmetry, the reflectional symmetry is also considered where the necklace is flipped along an axis. This means that half of the unique arrangements that we get from a normal circular arrangement are not distinct anymore. Due to this, we divide the total number of arrangements by .
Burnside’s Lemma
But the story doesn’t end there. In fact, it gets a whole lot complicated when we consider the possibility where some beads are not distinct. This means that there will be many arrangements that reduce or collapse to a single arrangement when a rotation or reflection is performed on them. Each group of such arrangements that represent that same rotation/reflection symmetry is called an orbit.
Burnside’s Lemma states that the number of such orbits
(equal to the number of truly distinct arrangement of necklaces)
is equal to the sum of the fixes
contributed by each transformation divided by
the number of rotations/flips performed on the necklace. Do not worry about the use of jargon
as we will clear it all up with a detailed example.
A fix
is simply an arrangement that stays unchanged (fixed) under a transformation
(rotation/reflection). Let’s take the example of a necklace with 4
beads and 2
colours
(represented by A
and B
) to really understand this.
The number of ways we can colour this necklace is .
AAAA
, AAAB
, AABA
, AABB
, ABAA
, ABAB
, ABBA
, ABBB
,
BAAA
, BAAB
, BABA
, BABB
, BBAA
, BBAB
, BBBA
, BBBB
Now, the total number of transformations we can perform on this necklace is (Rotations: ) + (Reflections: through opposite beads, through opposite gaps).
If we consider the rotation transformation of , there are only two colourings that
do not change. (AAAA
and BBBB
). Thus, the fix for this transformation (fix[]) is
. Similarly, if we take the transformation,
the colourings that stay the same are AAAA
, ABAB
, BABA
, BBBB
and its fix is . If we repeat
this process for all transformations and divide the sum of all the fixes by the number of
transformations, we get the number of orbits or in other words, the number of
distinct colourings that are completely asymmetric with each other.
This division (or averaging) accounts for the double counting that is present across all transformations. (e.g. and both have the same arrangements)
This lemma finds its use in unlabelled graph enumeration problems and even in chemistry to count isomers of a compound.
It can also be used to solve the problem from the Identical Slots section in a similar way!
Although it’s awfully difficult to get a grasp of this lemma without self-experimentation and visualization, I hope I did a decent job at explaining this without delving into group theory and theoretical shenanigans!
Geometry
Here are a few common ways combinatorics is used in geometry.
-
If we have a set of points, the maximum number of line segments that can be formed is simply as we are counting the number of possible groups of points. This idea can also be used to count the number of diagonals in an sided polygon. We first count the number of ways we can select points from and then subtract from it to avoid counting the sides of the polygon.
-
The number of ways shapes can intersect is times the maximum number of intersection points formed when two of those shapes intersect (), given that all intersections are pairwise i.e no point is common to more than two shapes. is a good upper bound if our assumptions are followed.
-
In general, the number of sided polygons we can form from a set of points is given by . But in the presence of collinear points, we have a case of overcounting. This is where our old friend inclusion-exclusion comes to aid. Let’s say there are collinear sets where set has collinear points. We then subtract the number of possible polygon pairs for each set from our naive solution as collinear points do not form a polygon. Let’s illustrate this with an example. Say we have points () and three sets of collinear points , , . The number of triangles we can form naively are . The first set currently contributes to “triangles” which are invalid. Similarly, the second contributes to “triangle” and the last set does not contribute to any. Our revised answer is . Even though there exists a shared point between and (), the overlap cannot affect our solution (Can you think why?). In case of line segments, each set contributes to only 1 line so we add the number of collinear sets back at the end.
Conclusion
If you made it this far, I must commend you for your grit. You might have realized by now that solving most combinatorics problems rely on understanding the nuances and constraints instead of a magical formula that can be applied naively. Learning and building the intuition behind applied mathematics is an incredibly fulfilling experience if carried out with diligence. This is why I like to believe that the factorial symbol was a consequence of the mathematicians’ astonishment.
Rescued by the haunting smile of mimigirl