Big Data

In our previous blog post, we’ve discussed the implementation of a framework built on top of Spark to enable agile and iterative data discovery between legacy systems and new data sources generated by IoT devices (smart city data set). We will now explore in detail, the components of this framework. The framework is composed by […]

In this series of blog posts, we will outline and explain in detail the implementation of a framework built on top of Spark to enable agile and iterative data discovery between legacy systems and new data sources generated by IoT devices. The internet of things (IoT) is certainly bringing new challenges for data practitioners. It’s […]

This post is meant to help you making your first step into data processing with Apache Spark using python API. In the age of Big Data processing, Hadoop map reduce (open source implementation of google map reduce model) has set down the foundation for processing “embarrassingly parallel” operations on distributed machines. Sadly, it shows programmability limitations and degradation in […]

In the previous post, we discussed data warehouse concept and emphasis key aspects such as data consistency, data history, complexity of the data integration process as the number of data sources grows and time to market. The enterprise data warehouse is built for analytical purpose and stored mainly structured data. It gives a competitive advantage […]

The leap to cloud computing is likely to happen in many companies – if not already the case – considering the flexibility it offers to improve IT efficiency. Cloud computing is simply resources available on demand. It enables companies to purchase large scale IT infrastructures and resources as needed without massive upfront investments. This is […]

It’s widely acknowledged that data scientists spend most of their time on data integration (curation) therefore focusing less on their core activities. Let’s have a look at an appealing tentative to mitigate this effort. Most companies tend to integrate more and more data sources in their data warehouse over the past years. It has been […]

Send this to friend