Today two guys asked me about how to pass an interview for a Software Engineering position (as I was offered SE jobs from Palantir and Google, and failed at tons of others). My answer was simple: think hard of a good solution and speak out loud what you are thinking, try to code without bugs, know how to test it and know what you are talking about, especially different data structures.
What I mean by “know” is that you should be able to explain the concept in a simple way and to answer most of the questions related to the concept.
To give an example, in an interview question, you can be asked: “Given a stream of sales: (item_id, number_of_items), how would you recommend a top N best-selling items”. Then be prepared to expect the conversation to be like:
Possible Question: how do you solve the problem?
Possible Answer: I use a max heap to remain the order of items with most sales. For any query, I return the top N items out of the heap. It takes O(NlgK) where K is the number of items in the heap.
Possible Question: How do you update the heap when a new tuple is received?
Possible Answer: For each tuple received from the stream, I update the counter of item_id in the heap, it takes O(lgK).
Possible Question: How do you know which node in the heap to update?
Possible Answer: In order to know which node to update in the heap, I use a hashmap item_id->node_id to keep track of where the item is located in the heap.
Possible Question: How does a hashmap work?
Possbile Answer: It keeps a list of linked list of (key, value). For lookup/modification, it uses the hash function of the key to locate the linked link, then iterates through the linked list to search for the key or add the new item to the linked list.
Possible Question: Implement a hashmap?
Possible Answer:
class HashMap(object):
def put(k, v):
….
def get(k):
….
Possible Question: How to remain O(1) for get/put?
Possible Answer: Keep the hash function of the key to be uniformly distributed.
Possible Question: What happens when the number of items is larger than the size of the list.
Possible Answer: I double the size of the list. Then, redistribute the items to the new list based on the hash(key) % list.size
Possible Question: How do you design the hash function?
Possible Answer: Do s.t like this:f(x)=(((31+x.t1)∗31+x.t2)∗31+x.t3)… as suggested in Effective Java by Joshua Bloch.
Possible Question: Why do you choose 31?
Possible Answer: Because 31 is prime
Possible Question: Why do you choose a prime number?
Possible Answer: Assumingk=ab⇒f(x)=(((ab+t1)∗ab+t2)ab+t3…) , ⇒f(x)=(((abab+t1∗ab+t2)ab+t3…)=(((ababab+t1abab+t2ab+t3)…) . Then, (t1=a,t3=abab) collides with (t1=1,t3=aabab) . Hence, the less factors k has, the less collisions can happen.
Finally, interviewer will possibly say “cool”. Other possible questions include “how does the heap work?”, “implement a heap”, “what if I want top N-items in only last week”, “how to get top N from the heap in O(NlgN)”, “how does arraylist work?”, “implement LRU”,…
Make sure you know when interviewing for a grad level role:
Apart from technical performance, make sure you are polite, humble, and clear. Despite all of these effort, you still may fail for some unknown reasons. Not all interviewers are looking for the same thing or there will be other people who can do better than you. Just try your best and move on.
What I mean by “know” is that you should be able to explain the concept in a simple way and to answer most of the questions related to the concept.
To give an example, in an interview question, you can be asked: “Given a stream of sales: (item_id, number_of_items), how would you recommend a top N best-selling items”. Then be prepared to expect the conversation to be like:
Possible Question: how do you solve the problem?
Possible Answer: I use a max heap to remain the order of items with most sales. For any query, I return the top N items out of the heap. It takes O(NlgK) where K is the number of items in the heap.
Possible Question: How do you update the heap when a new tuple is received?
Possible Answer: For each tuple received from the stream, I update the counter of item_id in the heap, it takes O(lgK).
Possible Question: How do you know which node in the heap to update?
Possible Answer: In order to know which node to update in the heap, I use a hashmap item_id->node_id to keep track of where the item is located in the heap.
Possible Question: How does a hashmap work?
Possbile Answer: It keeps a list of linked list of (key, value). For lookup/modification, it uses the hash function of the key to locate the linked link, then iterates through the linked list to search for the key or add the new item to the linked list.
Possible Question: Implement a hashmap?
Possible Answer:
class HashMap(object):
def put(k, v):
….
def get(k):
….
Possible Question: How to remain O(1) for get/put?
Possible Answer: Keep the hash function of the key to be uniformly distributed.
Possible Question: What happens when the number of items is larger than the size of the list.
Possible Answer: I double the size of the list. Then, redistribute the items to the new list based on the hash(key) % list.size
Possible Question: How do you design the hash function?
Possible Answer: Do s.t like this:
Possible Question: Why do you choose 31?
Possible Answer: Because 31 is prime
Possible Question: Why do you choose a prime number?
Possible Answer: Assuming
Finally, interviewer will possibly say “cool”. Other possible questions include “how does the heap work?”, “implement a heap”, “what if I want top N-items in only last week”, “how to get top N from the heap in O(NlgN)”, “how does arraylist work?”, “implement LRU”,…
Make sure you know when interviewing for a grad level role:
- Data structures: array, vector, trees, queue, stack, heap, hashmap, treemap.
- Algorithms: sorts, recursive programming, dynamic programming, string matching, basic Graph algorithms (BFS, DFS, Dijkstra).
- Concurrency: forking, deadlock, mutex, semaphore, Map/Combiner/Reduce.
- Memory allocation and garbage collection.
- Sql, database transaction and different isolation levels.
- Bit operations: |, &, ^, <<, >>.
- How disk works (seek, read, write).
- Maintainability: different design patterns, git/svn, service-oriented architecture.
- Scalability: load balancers, cache, proxy, index, shards, queue, replication.
- Availability: share-nothing architecture (so SPOF), eventual consistency, RAIDs.
- Consistency: vector clock, CAP theorem, FLP theorem, different levels of consistency, 2 phase, 3 phase commit, names of some concensus algorithms, consistent hashing, gossip protocols.
- Basic concrete maths and probability.
- Zen of Python (my bonus ;)).
Apart from technical performance, make sure you are polite, humble, and clear. Despite all of these effort, you still may fail for some unknown reasons. Not all interviewers are looking for the same thing or there will be other people who can do better than you. Just try your best and move on.