Big Data topic is deservedly gaining attention not only because of the 5 to 6 numbers figures salary expectation as a data scientist, but also because it is at the heart of strategic endeavours in many companies. It’s therefore not a surprise that many co-workers even without an IT background have heard about these two words “Big Data”.
Other the past months, I have been questioned by my professional network and even sometimes by friends or family about the necessary skills to harvest the topic: learn Big Data. The questions come in various form:
- What are the best big data technologies or programming languages to learn?
- Where should I start my learning path?
- Do I need to go back to university? Are free online courses worth it or should I go for paid courses? What about learning by myself?
First of all, I believe these are the wrong questions to ask. However, I do understand that it’s difficult to articulate your expectations, define a learning path and set goals when it comes to big data. The topic is overwhelming probably because it has been a buzzword (maybe still the case).
My goal here is not to argue about what should be flagged “Big Data” or not, but to give you an overview of the big data landscape and some useful information in order to help you build your learning path.
Now before we start, what is “Big Data”? It’s very difficult to have a straight forward definition of big data concept. This Wikipedia article is a good starting point to get a grasp of the topic. Big data is usually described by the following characteristics: volume, variety, velocity, variability and veracity. Have a look at the Wikipedia article for detail definition of each characteristic.
Your dataset may have additional characteristics such as velocity, variety…and so on, and therefore increase the level of complexity required to acquire, transform and derive meaningful insights from the data.
That is the key challenge! How to translate your data to gold (whatever you want to call that data, big or not): meaning how to collect, transform and extract useful insights from the data. It can be summarized into these 3 keys steps:
- Data collection & Data integration: it covers all the actions required to acquire, store and make the data available in readable format for data consumers.
- Data analysis and exploration: this is where you learn from the data and derive useful insights.
- And Data visualization: the last but not the least, covers all the actions to transform abstract data into visual data.
You can now structure your learning path around the core competencies required to tackle each step. Depending on your goal, you may want to put your focus on a given step or get an overall understanding of every step.
If you want to have an overall understanding of these topics without going deep into the details, you can start with free online courses. Major’s online courses websites such as Coursera, Udemy or edX (just to list these ones) offer a vast number of free online courses to get an introduction to big data.
If you foresee to dive deep into any of the core competencies you will certainly need more than few hours of free online course. My advice is to enrol into a graduate or certificate program. These programs are usually offered via paid courses and they have two advantages:
- Number 1 is commitment. You are likely to focus on your goal as you don’t want to waste the money invest in your education.
- Number 2 is the award. Your diploma or certificate highlight your accomplishments and can be added to your resume or LinkedIn profile.
Where to find these graduates or certificates programs? You have few options here:
Online courses websites
For example, Coursera offers specializations courses with a certificate at the end. Below are some top specializations you can find on their website:
I have listed all the courses available under “Data Science” and “Computer Science” as of Mid-April 2017. Explore this interactive chart to find out the result (click on the symbol on the bottom right of the chart to enter full screen mode):
Software vendors training programs
Apache Hadoop is now a standard for massive parallel computation hence a standard for big data processing. Many software vendors have built an ecosystem around Hadoop for big data collection, storage, processing and visualization.
The three majors Hadoop distributions are MapR, Cloudera and Hortonworks. They all offer training programs designed around their solutions. This is still a good option to shape your big data skills and earn certificates.
Universities/Extensions school’ programs
Learning Big Data is very simple when you have clear goals and expectations defined. The large number of concepts, technologies and programming languages can jeopardize your motivations if you lack clear milestones. Your best chance to succeed is to define your target before you start your learning journey.
Happy learning and stay curious !!