PASS Summit 2014 – II

The following is a continuation of the previous blog post PASS Summit 2014

Microsoft is changing the way we need to think about data. Microsoft has given us 3 good tools to work with big volumes of data. They are

1. Azure Data Factory – Retrieve data from the past
2. Azure Stream Analytics – Analyze the data in real-time
3. Azure Machine Learning – Predict the future trends based on the current data

There is good news for those who are familiar with the SSIS tool. Azure Data Factory enables you to process on-premises data like SQL Server, together with cloud data like Azure SQL Database, Blobs, and Tables. It is like SSIS in the cloud (though not exactly). Microsoft has introduced SSIS like control flows in Azure Data Factory which make the developers to work easily with the Azure Data Factory. As of now, only command line coding is provided to visualize the control flow. I hope, a simpler RAD tool would be available in the future which would allow the users to just drag and drop the controls to get the desired flow.

Stream Analytics is an event processing engine that helps uncover real-time insights from devices, sensors, infrastructure, applications and data. Azure-based HD Insight service incorporates Apache Storm capabilities for streaming-data analysis. Stream Analytics is highly scalable and has no hardware or other up-front cost, installation or set-up to do. It brings a unique perspective in the overall complex event processing and real-time processing space with some of the customer benefits like Low Cost (pay only for what we use), Faster Developer Productivity (SQL-like syntax) and Elasticity of Cloud (Azure). Specific scenarios that customers are doing with real-time event processing include Real-time ingestion, processing and archiving of data, Real-time Analytics and Connected devices (Internet of Things).

Azure Machine Learning is a fully-managed cloud service for predictive analytics. It combines new analytics tools, powerful algorithms developed for Xbox and Bing, and years of Microsoft machine learning research into one simple, easy-to-use cloud service. It supports R, the popular open-source programming environment for statistics and data mining. Azure ML makes it possible for people without deep data science backgrounds to start mining data for predictions.

Various speakers spoke on the 3rd and 4th day covering the topics mainly aligned to Power BI and SQL Server Performance management. I felt as if the same topic were demonstrated several times but in a different manner. But, I liked the session on “Power BI and R” and “7 Databases in 70 minutes”. In the latter session, Lara covered some of the major open source DBs used in transactional and Big Data systems. Though it lasted only for 70 minutes, the session covered the topics on how to interact with these databases, their strengths and weaknesses, and how to choose the ones that fit your needs. Lara explained how NoSQL is incorporated in Microsoft Azure through DocumentDB.

The session also covered the topic on exploring the five data models employed by these databases-relational, key/value, columnar, document and graph-and which kinds of problems are best suited to each. If anyone is interested to know more about this in detail, then read the book “Seven Databases in Seven Weeks” by Eric Redmond.

I also got to know about the CAP theorem and its implications on distributed data and understood the tradeoffs between consistency and availability, and when we can use them.

CAP Theorem

In one of the evening sessions, there was a demonstration to showcase how Microsoft plays a vital role in Big Data management. They demonstrated a case study about a telemetry system using Kinect (a motion sensing device used mainly in Microsoft Xbox games) where they used the device to detect the number of persons visiting a retail store and the aisle(s) that were visited most often and visualize them with the help of heat maps. With this approach, they were able to provide discounts and suggestions to most popular products based on frequent visits and also market the brands/products that are less popular.

Since Kinect sensor sends data every second, the data stored is in big volume as there were Kinect devices installed in all the aisles which contributes towards Big Data. The customers of the retail shop were given a mobile app which can be used to search for the required products, get to know about the offers and the most popular items that are being watched/purchased and get suggestions on related products based on their earlier purchases etc. The product catalog is available in Document Store and the price check is achieved through the key-value pair. The sensor data are stored in HBase and Analytics/Reporting based on this data is done through Hadoop Batch processing.

The End-to-End Power BI session was really a worthy session to attend to know how an MS Excel can be used for analyzing big data using Power Query, Power Pivot, Power View and Power Map which comes with Microsoft Excel (2013 and above) or as an add-on. Data from multiple sources can be discovered using Power Query and can be analyzed using Power Pivot. Finally with the final data available in Power Pivot, the data can be visualized in an easily understandable form like charts and graphs using Power View and Power Map. Power Query is a free plugin available for Excel which can be downloaded from the Microsoft site.

On 6th November, the session on Analyzing Tweets was really awesome and easy to understand to. They demonstrated how Microsoft Azure (HDInsight) can be used to access and analyze the required and useful information from a twitter account based on a defined sentimental dictionary which was used to parse the tweets. Once the tweets were stored, Excel was used to visualize the data in the desired format using the tools available in Excel as mentioned above. The data was stored in HDInsight and using Power Query in Excel, the data from HDInsight is pulled using the options provided to import the data from other sources. The use of the Power Map was demonstrated to show the geographical area from where the tweets are coming using Bing.

The final session on that day was the BI Power Hour session which was much entertaining at the same time an overview session which gave a complete insight of how Microsoft Azure and Power BI can be leveraged in Big Data.

Tags: Microsoft BI
previous post: PASS Summit 2014 next post: Why customers should not restrict themselves to Windows Azure?