It's a great question, although I feel like it means different things to different people, which is probably the reason for Rachel and Caio's disagreement.
First of all, data scientists absolutely must know about Lists, Tuples, Arrays, Dictionaries, Pandas objects, etc., but only to the extent that they can effectively use and manipulate these data structures. By this I mean an understanding of:
Operations that support these data structures
Big-O read/write/update times for these operations (amount of compute resources) A rough idea of how much memory is needed at any given time (memory resource count) Do data scientists need to know exactly what's going on under the hood? About 99% of the time that wouldn't be the answer (and I guess that's what Caio is referring to). But should they be able to use data structures in general? It means Rachel.
It's ironic, but computer scientists often overstate the importance of data structures. Unless you plan to get into a related field of research or work (or just find it very interesting), you don't need to know much more than a typical data scientist. Of course, you might not be using Python or R, so you might need to know data structures in a wider variety of languages, you might need to know some specific data structures in your workspace , eg. B. Octrees for 3D computer graphics and you may need to know more detailed things eg. B. the differences between hash maps and hash tables in Java. But very rarely do you need to implement (or debug) a data structure from scratch. With most common languages and data structures, it's likely that someone else has implemented the data structure and has already published it and maybe even placed it in the standard library. In short, data structures have been commodified.
On the other hand, I would say that algorithms are a whole different matter. Not only should you have a rough idea of Big-O's running times and the memory requirements of others' algorithms, but you should also be able to implement your own! I'm not necessarily talking about the complete implementation of XGBoost like Tianqi Chen, but about the machine learning scripts you write? These are all implementations of your custom algorithms.
Algorithms form the basis of problem solving. Anecdotally, a friend of mine once overheard someone else automatically group thousands of comments together and asked, "Hey, you're from a computer background, right? If you were to implement this, what data structure would you use?" ".carry?"
This is exactly the kind of question that an all-too-common focus in teaching data structure has led to (and computer science departments around the world are probably raising it, thanks to some faculty members who actually need to implement it). data structures themselves). I turned around and said, "It's probably hash tables, but let me ask you this: what approach or algorithm are you using to solve the problem?"
The less time-consuming approaches might have been: a. Perform unattended comment aggregation using LDA. Second, manually label a small subset of comments and train a supervised algorithm to label the rest. Third, by identifying certain keywords in the free text and performing a rule-based regular expression search with concrete grouping rules. You should decide which algorithm to use before making decisions about data structures, which are only there to ensure that the algorithms don't consume too much processing power and/or memory.
So overall, I must respectfully disagree with Rachel here. In my opinion, a deeper understanding of algorithms is needed, while the ability to use data structures is sufficient. The OP required a "strong knowledge" of data structures from companies "like Amazon". In this case, interview candidates are asked how to turn a binary tree into a doubly linked list (or even how to reverse the binary tree :)) rather than access times or interfaces to popular data structures. .
Is such a deep knowledge of data structures useful for a data scientist? Probably not for most people.