Biting Code

Knapsack without repetition

Similarly to the previous problem, we have a knapsack with a specified capacity, a number of items with different weight and value, and we want to select a subset of them that would fit in the knapsack maximizing its value. The difference is that now we are not allowed to put in the knapsack copies of the same item. For this reason this problem is also known as 0/1 knapsack.

If we keep a dynamic programming approach, the solution won't be too different to the previous one, however we'll need to store the results for all subproblems in a bidimensional array and the process of generating a new subsolution is going to be a bit more complex.

The idea is that we have a row for each family of subproblems having up to the actual number of items we can choose from.

Zeroth row is there just to simplify the algorithm, being quite obvious that a knapsack containing zero or one element chosen from no item is going to always have a weight of zero.

First row is only marginally more interesting. We check the first item, as soon as its weight matches the capacity of a (sub)knapsack, we assign its value to that subsolution, an then al the other subsolutions on its right would have the same value, since we have no other items available.

It is from second row that we have a more clear view on the problem complexity, since for a generic position in it we have to ensure the capacity of the current knapsack in the subproblem would be optimally filled by the available items.

Let's see a test case to help us to figure out how to work it out.

public void testWithouRepetition() {
 int[] weights = new int[] { 2, 3, 4, 6 };
 int[] values = new int[] { 9, 14, 16, 30 };

 assertThat(Knapsack.withoutRepetition(values, weights, 10), is(46));
}

The table with all the subsolutions for our problem would be:

[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0]
[ 0,  0,  9,  9,  9,  9,  9,  9,  9,  9,  9]
[ 0,  0,  9, 14, 14, 23, 23, 23, 23, 23, 23]
[ 0,  0,  9, 14, 16, 23, 25, 30, 30, 39, 39]
[ 0,  0,  9, 14, 16, 23, 30, 30, 39, 44, 46]

Notice that, zeroth column is an all-zero because whichever is the number of items available, having zero as capacity of the current sub-knapsack will lead to a no-item-selected solution. First column is again filled with zeroes, but for a different reason, there is no item with a weight 1.
In the first row we use only the first item available, the one with weight 2 and value 9. From the second cell on, we can put its value in it.
In the second row, we can put only {2, 9} in cell (2, 2). Next to the right, (2, 3), we could put {2, 9} or {3, 14}. The latter wins. Since (2, 5) on, we can put both items, so the value stored in these cells is 23.
And so on.

Here is how I have modified the algorithm seen in the previous post to adapt it to this slightly different problem:

public static int withoutRepetition(int[] values, int[] weights, int weight) {
   assert (values.length == weights.length); // 1

   int[][] results = new int[values.length + 1][weight + 1]; // 2
   for(int i = 1; i < results.length; ++i) { // 3
      for(int j = 1; j < results[i].length; ++j) { // 4
         results[i][j] = results[i - 1][j]; // 5
         if(weights[i - 1] > j) // 6
            continue;
         int candidate = results[i - 1][j - weights[i - 1]] + values[i - 1]; // 7
         if(candidate > results[i][j])
            results[i][j] = candidate;
      }
   }

   return results[values.length][weight];
}

1. A minimal checking to ensure no funny stuff in input. For each value should be available the correlated weight.
2. Core of the dynamic programming algorithm. A table that is going to contain all the subsolutions is created. The following double for loop will fill it up and in the bottom right cell we'll find the solution.
3. Row 0 is not affected, since we know that it has to contain just a bunch of zeroes.
4. Column 0, for each row, is not affected since, again, we just want a zero in it.
5. The current element is initialized with the value contained in the cell in row above, same column.
6. If the weight of the item associated with the current row is bigger than the capacity for the current subsolution, there is not much to do, just skip to the next iteration.
7. Otherwise, let's check if considering the new element in the knapsack will bring to a better solution. We make room for it in the knapsack, considering the best solution in the row above that has enough room for it, and add its value.

On GitHub you will see that my Java class Knapsack now has also the method to calculate the "without repetition" variation of the problem. And a test case for it has been added to the associated JUnit class.

Go to the full post

Knapsack with repetitions

We are given a set of items, each of them having a weight and a value, and a knapsack with a specified capacity. Each item is available in as many elements as we like. We want to fill the knapsack in a way that it contains the higher possible value.

A JUnit test case should help to clarify the problem:

public void testWithRepetition() {
   int[] weights = new int[] { 2, 3, 4, 6 };
   int[] values = new int[] { 9, 14, 16, 30 };

   assertThat(Knapsack.withRepetition(values, weights, 10), is(48));
}

We have for different items. The lighter one has weight 2 and value 9, the heavier 6 and 30. We want to write a static method withRepetition() in the class Knapsack that, given in input values, weight and knapsack capacity (here is 10) it would give back the maximum value storable in that knapsack.

In our case the expected result is 48, coming from weights 6 + 2 + 2, being the value of this triad 30 + 9 + 9.

Here is my Dynamic Programming solution, based on the algorithm discussed in chapter 6 of the excellent book Algorithms by S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani, using Java as implementation language:

public static int withRepetition(int[] values, int[] weights, int weight) {
   assert(values.length == weights.length);

   int[] results = new int[weight + 1]; // 1
   for(int i = 1; i < results.length; ++i) { // 2
      for(int j = 0; j < weights.length; ++j) { // 3
         if(weights[j] > i) // 4
            continue;
         int candidate = results[i - weights[j]] + values[j]; // 5
         if(candidate > results[i])
            results[i] = candidate;
      }
   }
   return results[results.length - 1]; // 6
}

1. Here I am going to store all the results of the subproblems and, in the last element, the one to the actual problem. Notice that we need an element also for the trivial case, knapsack of size zero, that has the obvious solution of zero.
2. Looping on all the DP subproblems, from the first one, where the capacity is one, to calculate at last the actual problem.
3. For each subproblem I verify which is among all the items available the one that maximize the result.
4. If the current item has a weight bigger than the knapsack capacity, there is nothing to do.
5. Check what happens using the previous solution that allows to use the current item, storing the candidate result in a temporary variable. If it is a better result, store it in the array of subresults.
6. In the last element of the array we have the required value.

The only test case that I tried, is the one provided by the book that you have already seen above. Anyway, you could find it on GitHub, side by side the source Java code for the class Knapsack.

Go to the full post

Levenshtein distance

Given two strings, return its Levenshtein distance, i.e. the minimum number of character changes needed to mutate the first string in the second one.

You can have a look at this Levenshtein demo to get a better idea of what is expected. Notice that the demo is more complicated than the required solution. We are interested only in the "plain" string type, and we always use weight 1 for indel (insertion/deletion) and substitution operations. Moreover, we don't consider swap.

The Levenshtein distance is often used to show how Dynamic Programming works in a more complicated case than the classic make change problem. Here I present my Java solution based on the Wagner–Fischer algorithm, that asks us to build a bidimensional table generating its values starting from the most basic cases up to the required solution.

The idea of the algorithm is to put on one table axis the first string and the second one on the other one, paying attention to give an extra position at the beginning of both string. So, for instance, if we want to calculate the distance between "one" and "bone", we'll 4 rows and 5 columns (or the other way round).

Then we have to populate the tables. The top-left cell is the easiest one, it represent the case I want to pass from an empty string to another empty one. No change is needed, so I put there a zero.

To understand what I should put in the first row I can think to the case I want to calculate the distance from an empty string to another one size n. Obviously, I would increase by one the value in each cell, representing the cost of inserting a new character.

Similarly, the first column will follow the same structure. Here the increment would represent the deletion of a character.

The generic case is a bit more complicated, since we have to evaluate four different cases. Let's think to the cell (1,1). To decide which value assign to it we need to compare the first elements of both string. If it is the same, we don't have to increase the distance, we just keep the value calculate for the top left case, namely cell (0,0). Otherwise, we need to consider all the three possible editing that would let us arrive there, inserting a new character, deleting or substituting the existing one. Since we want to find the minimum distance, we'll choose the smallest among these values.

Remembering how we filled the first row and column, we should see how to get the value for inserting and deleting, it is just a matter of increasing the value contained on the cell on the left and on the top, respectively. For substitution we have to refer to the top-left cell, as we have done for the equality case.

Once we have grasped the sense of it, building the table is a piece of cake, and what we have to do is just returning the value in the bottom-right cell.

For testing purpose, and for let me better understand how things work, instead of implementing the solution with a single static method, I used a class with a constructor, that does all the dirty job, and a couple of public methods, one that returns the calculated Levenshtein distance and another one that checks the table as generated by the constructor. You could see the full Java code on GitHub, both for the Levenshtein class and its associated test cases, here I publish and comment only the ctor:

private int[][] matrix; // 1

public Levenshtein(String from, String to) { // 2
    matrix = new int[from.length() + 1][to.length() + 1]; // 3
    for (int i = 1; i < matrix[0].length; ++i) { // 4
        matrix[0][i] = i;
    }
    for (int j = 1; j < matrix.length; ++j) {
        matrix[j][0] = j;
    }

    for (int i = 0; i < from.length(); i++) { // 5
        for (int j = 0; j < to.length(); j++) {
            if (from.charAt(i) == to.charAt(j)) { // 6
                matrix[i + 1][j + 1] = matrix[i][j];
            } else { // 7
                int insert = matrix[i][j + 1] + 1;
                int delete = matrix[i + 1][j] + 1;
                int substitute = matrix[i][j] + 1;
                matrix[i + 1][j + 1] = Math.min(Math.min(insert, delete), substitute);
            }
        }
    }
}

1. Here is where I am going to store all the partial solutions and the actual one.
2. I think the first string associated to the rows in matrix, the second one to the columns.
3. The matrix has one row plus one for each character in the first string and a column plus one for each character in the second one.
4. Initialize the zeroth row and column as discussed above. No need to initialize matrix[0][0], the default value 0 is exactly what we want.
5. Loop on all the characters in both strings.
6. It's a match! Put in the relative cell in the matrix the same value stored in the top left cell.
7. Otherwise calculate how much is going to cost to put there the right value by insertion, deletion, substitution, and put there the minimum value among these three ones.

Once built the matrix, the solution appears on the bottom right cell in the matrix.

Go to the full post

Minimum number of coins to make change

We should give a certain amount back, and we want minimize the number of coins involved in the operation, accordingly to the available denominations. So, for instance, we have only copper Euro cents (1, 2, 5) and the change should be 4. In this case the expect result is 2, a couple of 2c coins.

We could think to use a greedy procedure, and sometimes it works beautifully, as in example I showed above. However it does not work for all possible denominations. As a counterexample think what happens if we have to generate 40 from (1, 5, 10, 20, 25). The greedy solution is 3, 25c + 10c + 5c, while the optimal solution is 2, 20c + 20c.

Actually, this problem is commonly used to introduce dynamic programming, where the idea is solving all the subproblems originating from the required one, starting from the simplest ones, and using those partial solutions to build up the more complex ones.

Here is a possible Java solution:

public static int minNumCoins(int amount, List<Integer> coins) {
 assert (amount >= 0); // 1
 assert (isConsistent(coins)); // 2

 List<Integer> results = new ArrayList<>(amount + 1); // 3
 results.add(0); // 4
 for(int i = 1; i <= amount; ++i) { // 5
  results.add(Integer.MAX_VALUE); // 6
  for(int coin : coins) { // 7
   if(coin > i)
    break;

   int tentative = results.get(i - coin) + 1; // 8
   if(tentative < results.get(i)) {
    results.set(i, tentative);
   }
  }
 }

 return results.get(results.size() - 1); // 9
}

1. I assert that the required amount should be not negative.
2. I have delegate to an helper method the consistency check on the coin list. I assume the requisite are that it should not be empty, it has to start from 1, and to be sorted.
3. I'm going to store in this array-list the results for the subproblems and the problem itself. When a step would required a previously calculated subsolution, it would get the result from this list. This technique is known as memoization.
4. Base case. An amount of zero requires no coin.
5. Let's calculate all the subproblems before generating the result for the actual one.
6. Initialize the current solution to the worst possible value.
7. Check it against all available coins, assuming the list being sorted, see (2), we can stop looping as soon as we see a coin greater than the index of current element in the list.
8. A tentative solution is the number of tentatives for the previous intermediate step plus one. If this try is better than the current tentative solution, let's use it as candidate.
9. Now we just have to return the last item in the result array-list.

Full Java code is on github. There you will find, beside the code you have already seen above, in the same class CoinChanger also a greedy implementation, that is going to fail in some cases. Also a JUnit suite of test cases is there.

Go to the full post

Greatest Common Divisor

Given two integers, find their greatest common divisor.

If you don't know the Euclid algorithm, you should get to a piece of code like this:

static public int naive(int a, int b) {
    int result = 1;
    for (int i = 2; i <= Math.min(a, b); ++i) {
        if (a % i == 0 && b % i == 0) {
            result = i;
        }
    }
    return result;
}

This raw idea could be improved considering that it doesn't make any sense to check numbers bigger than the square root of the smaller one (beside itself), as well as the multiple of the already discarded numbers - if 2 is not a good candidate, there is no use in checking any even number.

However, Euclid devised such an elegant and fast converging algorithm that you should have a really good reason to explore alternatives.

Knowing that the gcd of a and b is the same as the gcd of a and a % b, we can write:

public static int euclid(int a, int b) {
    while (b != 0) {
        int temp = b;
        b = a % b;
        a = temp;
    }
    return a;
}

To ensure I didn't make any silly mistake when writing my Java code, I have written a couple of simple JUnit test cases. Both files are now on GitHub.

Go to the full post

Last digit of a Fibonacci number

The Fibonacci sequence leads rapidly to such huge values that it gets difficult even to think to a monster like Fibonacci of one million. However, let's say that we have a less mind-boggling problem, calculating just the last digit of it, even if for large input, like one hundred million. This is not a big issue, in the worst case it is to spot a linear time complexity solution, and if we check closely the properties of the sequence, we could devise a costant time one.

Firstly, I write a few test case. I am using JUnit 4, and the assertThat Hamcrest-style assertion:

public class FibonacciTest {
    static final Fibonacci fib = new Fibonacci();

    // ...

    @Test
    public void testLastDigit40() {
        assertThat(fib.lastDigitNaive(40), is(5));
    }

    @Test
    public void testLastDigit327305() {
        assertThat(fib.lastDigit(327_305), is(5));
    }

    @Test
    public void testLastDigit100mio() {
        assertThat(fib.lastDigit(100_000_000), is(5));
    }

It is reasonably easy to calculate Fibonacci(40) = 102_334_155 and, trust me, both Fibonacci(327_305) and Fibonacci(100_000_000) have also a five as their last digit.

Then I write a first implementation for lastDigit() method in my class Fibonacci:

public int lastDigit(int n) {
    if (n < 2) // 1
        return n;

    int previous = 0; // 2
    int current = 1;

    for (int i = 1; i < n; ++i) { // 3
        int older = previous;
        previous = current;
        current = (older + previous) % 10; // 4
    }

    return current;
}

1. Accordingly to the modern definition of this sequence, Fibonacci(0) is 0 and Fibonacci(1) is 1.
2. To calculate the current Fibonacci number we need to keep track of the latest two ones.
3. Let's loop until we reach our target. The previous value will be discarded, current one will become previous, and the current one is calculated.
4. We are lucky, we don't need to store the actual Fibonacci number, it's last digit is enough.

Not a bad solution, however we could do better. We just have to notice that after 60 numbers, the final number in the sequence enter in a loop and repeat the first value. The is an interesting thread on math stackexchange that gives more details on this curious feature.

Let's exploit it to get a better solution:

public int lastDigit(int n) {
    int[] fibLast = new int[60]; // 1
    fibLast[0] = 0;
    fibLast[1] = 1;
    for (int i = 2; i < 60; ++i) { // 2
        fibLast[i] = (fibLast[i - 1] + fibLast[i - 2]) % 10;
    }

    return fibLast[n%60]; // 3
}

1. I am going to do the same job as in the first implementation, however I cache the first sixty values in a raw array.
2. Notice the despicable use of the magic number 60 here and in (1). It should be a well commented constant instead!
3. Thanks to the looping nature of the last cypher in Fibonacci numbers, I just have to apply to modulo operator to extract the correct index in my cached values.

As an obvious performance improvement we should initialize the cache only the first time the method is called. In any case, I think the point is made, so I commit the Fibonacci Java class and its JUnit tester to github.

Go to the full post

Add testing to a Maven project

This post was not so obsolete as the previous two ones, however it referred to JUnit 4, while the most interesting point in the argument, I feel, is in the fact that we should use a recent Surefire Maven plugin if we want the test run on build work with JUnit 5.

So, again, so long my outdated post. If interested, you could get its new version on my other blog: https://thisthread.blogspot.com/2020/05/test-on-build-with-maven.html.

Go to the full post