Tuesday, June 28, 2011

Check If an Integer's Bit Representation Is a Palindrome

Question: how can you determine if a positive integer is a palindrome. For example, 5 is a palindrome because 5 = 101. Also, 9 is a palindrome because 9 = 1001.

Solution: an integer is palindrome when its bit representation is the same reading from left to right or from right to left. Thus, in order to know if it is a palindrome or not, we just have to check if the number's value is still the same when we read the number's bits backward (right to left). For example, 5 is 101 and when we read its bits from right to left, which is 101, the result is still 5. However, 4 is 100 and when read backward it becomes 001 which is 1, so 4 is not a palindrome.

Pseudocode: here is the pseudocode for the algorithm:

integer nResultNum = 0
integer nInputNumCopy = nInputNum

while nInputNumCopy > 0
  nResultNum = nResult << 1;
  if nInputNumCopy & 1 == 1
    nResultNum = nResultNum | 1;
  nResultNumCopy = nResultNumCopy >> 1;
end while

if nResultNumCopy == nInputNum 
  return true

return false

Here is the implementation of the pseudocode using JavaScript:

function bIsIntPalindrome(nInputNum)
{
   if (nInputNum === 'undefined' || nInputNum === null || nInputNum < 0)
      return false;

   var nInputNumCopy = nInputNum;
   var nResultNum = 0;

   while (nInputNumCopy > 0)
   {
      nResultNum <<= 1;

      if (nInputNumCopy & 1)
      {
         nResultNum |=1;
      }

      nInputNumCopy >>= 1;
   }

   if (nInputNum == nResultNum)
      return true;

   return false;
}

Explanation: the general plan is to generate a number whose bit representation is the reverse of that of the input number. After that, we check to see if the two numbers equal. If they do, the input number is palindrome as explained above:

  1. Check for invalid inputs.
  2. nResultNum is the number whose bit representation is the reverse of the input number. nInputNumCopy is the copy of the input number. Since we'll destroy the value in the process, we must make copy of it so that we can compare the result to the original input later on.
  3. The while loop runs until nInputNumCopy is 0, meaning that we have copied all of its bits. In other words, this loop copies the bits from nInputNumCopy to nResultNum from right to left. At each iteration:
    • First, shift nResultNum's bits to the left by 1
    • Second, check if the current right most bit in nInputNumCopy is 1 by using & operator. If the right most bit is 1, turn the right most bit of nResultNum to 1 using | operator.
    • Third, shift all bits of nInputNumCopy to the right by 1.
  4. When finishing copying the bits from right to left, we check for equality between nInputNum (the original input value) and nResultNum (containing the same number of bits but in reverse order). If the two equal, then input number is palindrome, otherwise not.

Example: let's check to see if 5 is a palindrome. At the start of while loop, variables' values are nResultNum = 0 and nInputNumCopy = 5 (101). Entering the while loop:

  1. First iteration, nResultNum <<= 1 still equals 0 because 000 << 1 = 0. nInputNumCopy & 1 = 1 because 101 & 001 = 001, so nResultNum |= 1 = 000 | 1 = 001. Then, nInputNumCopy >>= 1 = 2 because 101 >> 1 = 010 which is 2. Because nInputNumCopy > 0, loop continues.
  2. Second iteration, nResultNum <<= 1 makes nResultNum = 010 because 001 << 1 = 010. Since nInputNumCopy & 1 = 0 (010 & 001 = 000), nothing added to nResultNum in this iteration. nInputNumCopy >>= 1 = 001 = 1. Since nInputNumCopy is still greater than 0, the loop continues.
  3. Third iteration, (nResultNum <<= 1) = (010 << 1) = 100. (nInputNumCopy & 1) = 001 & 001 = 001, so a 1 bit is added to nResultNum such that (nResultNum |= 1) = (100 | 1) = 101. Shift nInputNumCopy to right by 1, making nInputNumCopy = 000. Since nInputNumCopy is now 0, the loop ends.

After the while loop, nResultNum = 101 which is 5. Thus, nResultNum equals nInputNum. The method returns true. 5 is a palindrome. We're done.

I hope the explanation was clear. If you have any question, please feel free to ask by emailing or posting in the comment section. Until next time!

Saturday, June 25, 2011

Search for Intersection Between Two Sorted Arrays of Integers

Question: there are two sorted arrays of integers such as array(1, 3, 6, 9, 10) and array(-2, 0, 4, 6, 12). Search for the intersection between these arrays. The intersection is 6 at index 2 and index 3 in the example arrays.

Solution: a naive solution would be to compare each element in the first array to each element in the second array, resulting in an O(N * M) algorithm where N and M are the numbers of elements in first and second array respectively. A better solution is using binary search for each element in the first array in the second array. That is an O(N LogM) algorithm. However, we can even do better using two array iterators at a time. Take a look at the pseudocode below:

it1, it2
while it1 < size1 and it2 < size2
  if (array1[it1] == array2[it2])
    print(it1 and it2)
    return
  else if (array1[it1] > array2[it2])
    it2++
  else
    it1++

We simply move the iterators, one for each array, whenever an element of one array is less that the element of the other array. We do this until we either find the common element (intersection) or we reach the end of any array. This works because both arrays are sorted in the ascending order. Thus, we don't have to compare each and every element to all other elements to know where the intersection is. Here is the implementation of the pseudocode in JavaScript :)

function vIntersectionInSortedArrays(cIntArray1, cIntArray2)
{
   if (cIntArray1 === null || cIntArray1 === undefined 
      || cIntArray2 === null || cIntArray2 === undefined)
   {
      document.write("Empty arrays!"); 
      return;
   }

   var nArray1It = 0;
   var nArray2It = 0;
   
   while (nArray1It < cIntArray1.length 
         && nArray2It < cIntArray2.length)
   {
      if (cIntArray1[nArray1It] == cIntArray2[nArray2It])
      {
         document.write("Intersect at index" + nArray1It 
                        + " and " + nArray2It);
         return;
      }
      else if (cIntArray1[nArray1It] > cIntArray2[nArray2It])
         nArray2It++;
      else
         nArray1It++;
   }

   document.write("No intersection!");
}

I know it's unusual to use JavaScript but a break from C++ and Java is nice. As we expect, this algorithm is more efficient. In the worst case, it takes only O(M + N) time to return the answer!

As always, thanks for following and if you have any suggestion, please don't hesitate to let me know in the comment section.

Thursday, June 23, 2011

Convert integers to roman numbers

Question: write an algorithm to convert an integer into roman number. For example, 1 -> I, 2 -> II, or 4 -> IV.

Solution: in order to solve this problem, we must know the rules of writing roman numerals. For example, I is 1 and V is 5 but IV is not 6 but 4 (5 -1). Moreover, there is no 0 number in roman numeral system. Here is the link to an article about roman numerals if you are unfamiliar with the system.

As you may notice, the roman numeral system consists of several fundamental and unique numbers. They are used in conjunction with rules to create other numbers. Therefore, we just have to cache the unique numbers and apply the rules in order to generate any roman number we want. Let's take a look at the implementation below in C++

#include<iostream>
#include<map>
#include<string>
using namespace std;

string int2RomanNum(int intVal)
{
   if (intVal <= 0)
   {
      cout << "Roman numbers don't support 0 or negative! Return NULL" << endl;
      return ""; 
   }

   //build hash table of unique values 
   map valueMap;

   valueMap[1] = "I";
   valueMap[4] = "IV";
   valueMap[5] = "V";
   valueMap[9] = "IX";
   valueMap[10] = "X";
   valueMap[40] = "XL";
   valueMap[50] = "L";
   valueMap[90] = "XC";
   valueMap[100] = "C";
   valueMap[400] = "CD";
   valueMap[500] = "D";
   valueMap[900] = "CM";
   valueMap[1000] = "M";

   //the roman value
   string romanResult = "";

   //traverse the list in reverse order 
   map::reverse_iterator it;
   
   for (it = valueMap.rbegin(); it != valueMap.rend(); it++)
   {
      //if current number is greater than current key in list
      //add the value corresponded with key to result
      //then subtract the equivalent int value from current number
      while (intVal >= it->first)
      {
         romanResult = romanResult + it->second;
         intVal = intVal - it->first;
      }
   }

   return romanResult;
}

Explanation: our method accepts an integer as parameter and return a string that contains the roman number equivalent to that integer.

  1. First we check the parameter to see if it is equal or less than 0. Because there is no 0 or negative roman numbers, we return an empty string after printing out the warning.
  2. Next, we build a hash table of unique roman numbers which are then combined to create other numbers.
  3. The heart of this method consists of two loops. The "for" loop runs through the hash table from bottom to top (largest number to smallest number). At each number in the hash table, we run the "while" loop to construct the roman number. This while loop will run until the integer is less than the current number in the hash table. And for every iteration in the while loop we add the current roman number to the returned string. For example, if our integer is 35, at the 10 or X position in the hash table, the while loop will kick in and add XXX into our string. And then the for loop continues at the 5 or V position, letting the while loop add V into our string.

Example: let's say we want to convert the integer 430 into its equivalent roman number. Here is how the method runs:

  1. First "for" loop's iteration, intVal = 430 and it->first = 1000. No while loop because intVal is less than it->first.
  2. Second iteration, intVal = 430 and it->first = 900. No while loop.
  3. Third iteration, intVal = 430 and it->first = 500. No while loop.
  4. Fourth iteration, intVal = 430 and it->first = 400. Enter while loop: romanResult = CD and intVal = 30. Because intVal is less than it->first after the first "while" iteration, the while loop exits.
  5. Fifth iteration, intVal = 30 and it->first = 100. No while loop.
  6. Sixth iteration, intVal = 30 and it->first = 90. No while loop.
  7. Seventh iteration, intVal = 30 and it->first = 50. No while loop.
  8. Ninth iteration, intVal = 30 and it->first = 40. No while loop.
  9. Tenth iteration, intVal = 30 and it->first = 10. Enter while loop: 1) intVal = 20 and romanResult = CDX, 2) intVal = 10 and romanResult = CDXX, and 3) intVal = 0 and romanResult = CDXXX. The while loop exits after that because intVal is less than it->first.
  10. Nothing happens in the last four iterations because intVal is 0. Thus, the final result is romanResult = CDXXX

Thank you for reading and until next time :)

Thursday, June 2, 2011

Least-Square Linear Regression of Data Using C++

Question: implement the least-square method to determine the linear function that best fits the data. This method also needs to find the coefficient of determination (R^2) and standard error of estimation (E). Input to this method is a collection of data points (x, y) and the collection's size, a.k.a. number of data points.

Solution: the answer is straight forward. We basically just have to apply the statistics formulas for finding the least-square linear function to the data. If you are not familiar with the formulas and where they come from here is the link for you. Now, let's take a look at the implementation below:

#include<iostream>
#include<cmath>
using namespace std;

struct Point
{
   double x;
   double y;
};

void leastSqrRegression(struct Point* xyCollection, int dataSize)
{
   if (xyCollection == NULL || dataSize == 0)
   {
      printf("Empty data set!\n");
      return;
   }

   double SUMx = 0;     //sum of x values
   double SUMy = 0;     //sum of y values
   double SUMxy = 0;    //sum of x * y
   double SUMxx = 0;    //sum of x^2
   double SUMres = 0;   //sum of squared residue
   double res = 0;      //residue squared
   double slope = 0;    //slope of regression line
   double y_intercept = 0; //y intercept of regression line
   double SUM_Yres = 0; //sum of squared of the discrepancies
   double AVGy = 0;     //mean of y
   double AVGx = 0;     //mean of x
   double Yres = 0;     //squared of the discrepancies
   double Rsqr = 0;     //coefficient of determination

   //calculate various sums 
   for (int i = 0; i < dataSize; i++)
   {
      //sum of x
      SUMx = SUMx + (xyCollection + i)->x;
      //sum of y
      SUMy = SUMy + (xyCollection + i)->y;
      //sum of squared x*y
      SUMxy = SUMxy + (xyCollection + i)->x * (xyCollection + i)->y;
      //sum of squared x
      SUMxx = SUMxx + (xyCollection + i)->x * (xyCollection + i)->x;
   }

   //calculate the means of x and y
   AVGy = SUMy / dataSize;
   AVGx = SUMx / dataSize;

   //slope or a1
   slope = (dataSize * SUMxy - SUMx * SUMy) / (dataSize * SUMxx - SUMx*SUMx);

   //y itercept or a0
   y_intercept = AVGy - slope * AVGx;
   
   printf("x mean(AVGx) = %0.5E\n", AVGx);
   printf("y mean(AVGy) = %0.5E\n", AVGy);

   printf ("\n");
   printf ("The linear equation that best fits the given data:\n");
   printf ("       y = %2.8lfx + %2.8f\n", slope, y_intercept);
   printf ("------------------------------------------------------------\n");
   printf ("   Original (x,y)   (y_i - y_avg)^2     (y_i - a_o - a_1*x_i)^2\n");
   printf ("------------------------------------------------------------\n");

   //calculate squared residues, their sum etc.
   for (int i = 0; i < dataSize; i++) 
   {
      //current (y_i - a0 - a1 * x_i)^2
      Yres = pow(((xyCollection + i)->y - y_intercept - (slope * (xyCollection + i)->x)), 2);

      //sum of (y_i - a0 - a1 * x_i)^2
      SUM_Yres += Yres;

      //current residue squared (y_i - AVGy)^2
      res = pow((xyCollection + i)->y - AVGy, 2);

      //sum of squared residues
      SUMres += res;
      
      printf ("   (%0.2f %0.2f)      %0.5E         %0.5E\n", 
       (xyCollection + i)->x, (xyCollection + i)->y, res, Yres);
   }

   //calculate r^2 coefficient of determination
   Rsqr = (SUMres - SUM_Yres) / SUMres;
   
   printf("--------------------------------------------------\n");
   printf("Sum of (y_i - y_avg)^2 = %0.5E\t\n", SUMres);
   printf("Sum of (y_i - a_o - a_1*x_i)^2 = %0.5E\t\n", SUM_Yres);
   printf("Standard deviation(St) = %0.5E\n", sqrt(SUMres / (dataSize - 1)));
   printf("Standard error of the estimate(Sr) = %0.5E\t\n", sqrt(SUM_Yres / (dataSize-2)));
   printf("Coefficent of determination(r^2) = %0.5E\t\n", (SUMres - SUM_Yres)/SUMres);
   printf("Correlation coefficient(r) = %0.5E\t\n", sqrt(Rsqr));

}

Explanation: assuming that each data point is structured similar to our Point struct, then our method takes an array of Points and the array's size as its parameters.

To advance any further, we must first find these sums:

  • SUMx: the sum of all x values of the points
  • SUMy: the sum of all y values of the points
  • SUMxx: the sum of the square of x values, meaning that we square x values individually and add them togethers
  • SUMxy: for each point we multiply its x value with its y value, then add all the results together to find this sum.

The first for loop is used to calculate those sums simultaneously.

After that, we calculate the means of x and y values which equal sum of x values divided by the number of points and sum of y values divided by the number of points respectively.

slope and y_intercept of the best-fit function can then be determined using the means and the sums we found. Thus, the linear function is y = slope*x + y_intercept.

Finally, to calculate the coefficient of determination and standard error of estimation, we need to find the sum of squared standard deviation (SUM_Yres) of each point from the best-fit linear function and sum of the squared residues (SUMres). That's what the second for loop does.

Once, we know sum of squared residues and sum of squared standard deviation, we just apply formulas to find the coefficient and the standard error.

As you can see, the challenge is not writing the code to compute the least-square regression but being able to understand the logic behind that. Why slope, y_intercept, coefficient of determination and standard error of estimation are calculated that way? Where do the formulas come from? Etc..

Answering those questions is beyond the scope of this post, so I leave it to you. I actually took statistics in order to understand the concepts and translate the formulas into code :)

As always, any comment or suggestion is welcome! Bye for now.