Top 7 algorithms and data structures every programmer should know about
Coming from the background of Competitive Programming and Software Development, I have compiled a list of algorithms and data structures that every programmer should know about. We will see what they do and where they are used with simplest examples. This list is prepared keeping in mind their use in competitive programming and current development practices.
1. Sort Algorithms
Sorting is the most heavily studied concept in Computer Science. Idea is to arrange the items of a list in a specific order. Though every major programming language has built-in sorting libraries, it comes in handy if you know how they work. Depending upon requirement you may want to use any of these.
- Merge Sort
- Quick Sort
- Bucket Sort
- Heap Sort
- Counting Sort
- Sorting by price, popularity etc in e-commerce websites
- Sorting by score in HackerEarth contest leaderboard
Binary Search (in linear data structures)
Binary search is used to performs a very efficient search on sorted dataset. The time complexity is O(log2N). Idea is to repeatedly divide in half the portion of the list that could contain the item, until we narrow it down to one possible item. Some applications are:
- When you search for a name of song in a sorted list of songs, it performs binary search and string-matching to quickly return the results.
- Used to debug in git through git bisect
DFS and BFS are tree/graph traversing and searching data structures. We wouldn’t go deep into how DFS/BFS work but will see how they are different through following animation.
Applications:
- Used by search engines for web-crawling
- Used in artificial intelligence to build bots, for instance a chess bot
- Finding shortest path between two cities in a map and many other such applications
3. Hashing
Hash lookup is currently the most widely used technique to find appropriate data by key or ID. We access data by its index. Previously we relied on Sorting+Binary Search to look for index whereas now we use hashing.
The data structure is referred as Hash-Map or Hash-Table or Dictionary that maps keys to values, efficiently. We can perform value lookups using keys. Idea is to use an appropriate hash function which does the key -> value mapping. Choosing a good hash function depends upon the scenario.
Applications:
- In routers, to store IP address -> Path pair for routing mechanisms
- To perform the check if a value already exists in a list. Linear search would be expensive. We can also use Set data structure for this operation.
Dynamic programming (DP) is a method for solving a complex problem by breaking it down into simpler subproblems. We solve the subproblems, remember their results and using them we make our way to solve the complex problem, quickly.
I cannot help but quote this answer on Quora to explain DP in layman terms.
*writes down “1+1+1+1+1+1+1+1 =” on a sheet of paper* What’s that equal to?
*counting* Eight!
*writes down another “1+” on the left* What about that?
*quickly* Nine!
How’d you know it was nine so fast?
You just added one more
So you didn’t need to recount because you remembered there were eight! Dynamic Programming is just a fancy way to say ‘remembering stuff to save time later’
Applications:
- There are many DP algorithms and applications but I’d name one and blow you away, Duckworth-Lewis method in cricket.
Say you want to calculate 232. Normally we’d iterate 32 times and find the result. What if I told you it can be in 5 iterations?
Exponentiation by squaring or Binary exponentiation is a general method for fast computation of large positive integer powers of a number in O(log2N). Not only this, the method is also used for computation of powers of polynomials and square matrices.
Application:
- Calculation of large powers of a number is mostly required in RSA encryption. RSA also uses modular arithmetic along with binary exponentiation.
Pattern matching/searching is one of the most important problem in Computer Science. There have been a lot of research on the topic but we’ll enlist only two basic necessities for any programmer.
KMP Algorithm (String Matching)
Knuth-Morris-Pratt algorithm is used in cases where we have to match a short pattern in a long string. For instance, when we Ctrl+F a keyword in a document, we perform pattern matching in the whole document.
Regular Expression (String Parsing)
Many a times we have to validate a string by parsing over a predefined restriction. It is heavily used in web development for URL parsing and matching.
7. Primality Testing Algorithms
There are deterministic and probabilistic ways of determining whether a given number is prime or not. We’ll see both deterministic and probabilistic (nondeterministic) ways.
Sieve of Eratosthenes (deterministic)
If we have certain limit of numbers, say determine all primes within range 100 to 1000 then Sieve is a way to go. The length of range is a crucial factor, because we have to allocate certain amount of memory according to range.
For any number n, incrementally testing upto sqrt(n) (deterministic)
In case you want to check for few numbers which are sparsely spread over a long range (say 1 to 1012), Sieve won’t be able to allocate enough memory. You can check for each number n by traversing only upto sqrt(n) and perform a divisibility check on n.
Fermat primality test and Miller–Rabin primality test (both are nondeterministic)
Both of these are compositeness tests. If a number is proved to be composite, then it sure isn’t a prime number. Miller-Rabin is a more sophisticated one than Fermat’s. Infact, Miller-Rabin also has a deterministic variant, but then its a game of trade between time complexity and accuracy of the algorithm.
Application:
- The single most important use of prime numbers is in Cryptography. More precisely, they are used in encryption and decryption in RSA algorithm which was the very first implementation of Public Key Cryptosystems
- Another use is in Hash functions used in Hash Tables