neCloud

Thursday, 23 February 2017

Microsoft Certification 70-475 : Designing and Implementing Big Data Analytics Solutions

Having recently sat and passed Microsoft’s exam 70-475, I thought I’d publish the list of references I built up whilst studying. This is still a relatively new exam, so study materials are hard to come by, just as for exam 70-473. As usual, I also made use of the Mindhub practice exam.

I found it difficult to pin-down specific resources for some of the objective areas, so it’s by no means extensive, but covers a good chunk of the exam content.

I also recommend having some prior knowledge of MS SQL, Hadoop and Azure ecosystems before tackling this exam.

Hope this helps!

1. Design big data batch processing and interactive solutions

Ingest data for batch and interactive processing

https://docs.microsoft.com/en-us/azure/data-lake-store/
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-performance
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-load

Ingest from cloud-born or on-premises data,

https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-data-scenarios

store data in Microsoft Azure Data Lake,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector#sample-copy-data-from-azure-blob-to-azure-data-lake-store

store data in Azure BLOB Storage,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector#sample-copy-data-from-azure-data-lake-store-to-azure-blob

perform a one-time bulk data transfer,

https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-offline-bulk-data-upload

perform routine small writes on a continuous basis

Design and provision compute clusters

https://blogs.msdn.microsoft.com/cindygross/2015/02/26/create-hdinsight-cluster-in-azure-portal/

Select compute cluster type,

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-introduction#a-nameoverviewaoverview-of-the-hadoop-ecosystem-in-hdinsight
https://www.blue-granite.com/blog/how-to-choose-the-right-hdinsight-cluster

estimate cluster size based on workload

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-provision-clusters

Design for data security

Protect personally identifiable information (PII) data in Azure
encrypt and mask data,
implement role-based security

https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data

Design for batch processing

https://docs.microsoft.com/en-us/azure/batch/batch-technical-overview

Select appropriate language and tool,
identify formats,
define metadata,

Microsoft Azure Batch - slides 46-48

configure output

Design interactive queries for big data

https://docs.microsoft.com/en-gb/azure/hdinsight/hdinsight-apache-spark-overview

Provision Spark cluster,

https://docs.microsoft.com/en-gb/azure/hdinsight/hdinsight-apache-spark-jupyter-spark-sql

set the right resources in Spark cluster,

https://blogs.msdn.microsoft.com/bigdatasupport/2015/08/19/some-things-to-consider-for-your-spark-on-hdinsight-workload/

execute queries using Spark SQL,
select the right data format (Parquet),

http://parquet.apache.org/documentation/latest/

cache data in memory (make sure cluster is of the right size),
visualize using business intelligence (BI) tools (for example, Power BI, Tableau),

https://docs.microsoft.com/en-gb/azure/hdinsight/hdinsight-apache-spark-use-bi-tools
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-integrate-power-bi

select the right tool for business analysis

2. Design big data real-time processing solutions

Ingest data for real-time processing

https://docs.microsoft.com/en-gb/azure/stream-analytics/stream-analytics-introduction
http://download.microsoft.com/download/6/2/3/623924DE-B083-4561-9624-C1AB62B5F82B/real-time-event-processing-with-microsoft-azure-stream-analytics.pdf
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-storm-sensor-data-analysis - hands-on tutorial

Select data ingestion technology,

https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-what-is-event-hubs

design partitioning scheme,

https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-what-is-event-hubs#partitions

design row key of event tables in Hbase

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hbase-overview
http://www.dummies.com/programming/big-data/hadoop/row-keys-in-the-hbase-data-model/
http://hbase.apache.org/0.94/book/rowkey.design.html

Design and provision compute resources

Select streaming technology in Azure,

https://docs.microsoft.com/en-gb/azure/stream-analytics/stream-analytics-comparison-storm

select real-time event processing technology,

https://docs.microsoft.com/en-us/azure/iot-hub/iot-hub-compare-event-hubs

select real-time event storage technology,

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs

select streaming units,

https://azure.microsoft.com/en-us/pricing/details/stream-analytics/#
https://docs.microsoft.com/en-gb/azure/stream-analytics/stream-analytics-scale-jobs

configure cluster size,

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-provision-clusters#basic-configuration-options
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-provision-clusters#cluster-types

assign appropriate resources for Spark clusters,

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-resource-manager#what-is-the-optimum-cluster-configuration-to-run-spark-applications
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-resource-manager#how-do-i-know-if-i-am-running-out-of-resource

assign appropriate resources for HBase clusters,

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hbase-tutorial-get-started#create-hbase-cluster

utilize Visual Studio to write and debug Storm topologies

https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-storm-develop-csharp-visual-studio-topology

Design for Lambda architecture

https://blogs.technet.microsoft.com/msuspartner/2016/01/27/azure-partner-community-big-data-advanced-analytics-and-lambda-architecture/
https://social.technet.microsoft.com/wiki/contents/articles/33626.lambda-architecture-implementation-using-microsoft-azure.aspx
http://lambda-architecture.net/

Identify application of Lambda architecture,
utilize streaming data to draw business insights in real time,
utilize streaming data to show trends in data in real time,
utilize streaming data and convert into batch data to get historical view,
design such that batch data doesn’t introduce latency,
utilize batch data for deeper data analysis

Design for real-time processing

Real-Time Event & Stream Processing on MS Azure

Design for latency and throughput,
- design reference data streams,
- design business logic,
- design visualization output

3. Design Machine Learning solutions

Create and manage experiments

https://docs.microsoft.com/en-gb/azure/machine-learning/machine-learning-create-experiment
https://docs.microsoft.com/en-gb/azure/machine-learning/machine-learning-studio-overview-diagram

Create, manage, and share workspaces;

https://docs.microsoft.com/en-gb/azure/machine-learning/machine-learning-walkthrough-1-create-ml-workspace
https://docs.microsoft.com/en-gb/azure/machine-learning/machine-learning-create-workspace

create training experiment;

https://docs.microsoft.com/en-gb/azure/machine-learning/machine-learning-walkthrough-3-create-new-experiment

select template experiment from Machine Learning gallery

https://docs.microsoft.com/en-gb/azure/machine-learning/machine-learning-sample-experiments

Determine when to pre-process or train inside Machine Learning Studio

Select model type based on desired algorithm,

https://docs.microsoft.com/en-gb/azure/machine-learning/machine-learning-algorithm-choice

select technique based on data size

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-prepare-data

Select input/output types

Select appropriate SQL parameters,

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-import-data-from-online-sources

select BLOB storage parameters,

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-import-data-from-online-sources#supported-online-data-sources

identify data sources,

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-import-data

select HiveQL queries

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-create-features-hive

Apply custom processing steps with R and Python

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-python-data-access
https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-extend-your-experiment-with-r
https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-custom-r-modules

Visualize custom graphs,

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-custom-r-modules#elements-in-the-xml-definition-file
https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-execute-python-scripts#working-with-visualizations

estimate custom algorithms,

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-algorithm-choice
http://download.microsoft.com/download/A/6/1/A613E11E-8F9C-424A-B99D-65344785C288/microsoft-machine-learning-algorithm-cheat-sheet-v6.pdf

select custom parameters,

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-web-service-parameters
https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-execute-python-scripts#basic-usage-scenarios-in-machine-learning-for-python-scripts

interact with datasets through notebooks (Jupyter Notebook)

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-gallery-jupyter-notebooks
https://gallery.cortanaintelligence.com/notebooks
https://gallery.cortanaintelligence.com/Notebook/Tutorial-on-Azure-Machine-Learning-Notebook-1

Publish web services

Operationalize Azure Machine Learning models,

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-publish-a-machine-learning-web-service

operationalize Spark models using Azure Machine Learning,

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-spark-overview
https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-data-science-spark-model-consumption#consume-spark-models-through-a-web-interface

operationalize custom models

https://docs.microsoft.com/en-us/azure/machine-learning/machine-learning-model-progression-experiment-to-web-service

4. Operationalize end-to-end cloud analytics solutions

Create a data factory

Identify data sources,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-introduction#data-movement-activities

identify and provision data processing infrastructure,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-introduction#data-transformation-activities

utilize Visual Studio to design and deploy pipelines

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-copy-activity-tutorial-using-visual-studio
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-build-your-first-pipeline-using-vsm
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-build-your-first-pipeline

Orchestrate data processing activities in a data-driven workflow

Leverage data-slicing concepts,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#time-series-datasets-and-data-slices

identify data dependencies and chaining multiple activities,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#run-activities-in-a-sequence

model complex schedules based on data dependencies,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#data-dependency-deep-dive

provision and run data pipelines

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-pipelines#create-pipelines

Monitor and manage the data factory

Identify failures and root causes,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-monitor-manage-app
https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-monitor-manage-pipelines

create alerts for specified conditions,

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-monitor-manage-app#creating-alerts
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-monitor-manage-pipelines#create-alerts

perform a restatement

Move, transform, and analyze data

Leverage Pig, Hive, MapReduce for data processing;

https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-pig-activity
https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-hive-activity
https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-map-reduce

copy data between on-premises and cloud;

https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-move-data-between-onprem-and-cloud
https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-data-management-gateway

copy data between cloud data sources;

https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-data-movement-activities

leverage stored procedures;

https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-stored-proc-activity

leverage Machine Learning batch execution for scoring, retraining, and update resource;

https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-azure-ml-batch-execution-activity

extend the data factory with custom processing steps;

https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-use-custom-activities

load data into a relational store

https://docs.microsoft.com/en-gb/azure/data-factory/data-factory-azure-sql-connector

visualize using Power BI

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-integrate-power-bi
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-get-started-visualize-with-power-bi

Design a deployment strategy for an end-to-end solution

Leverage PowerShell for deployment,

https://docs.microsoft.com/en-us/powershell/resourcemanager/azurerm.datafactories/v2.3.0/azurerm.datafactories

automate deployment programmatically

https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-data-factories-programmatically
https://msdn.microsoft.com/library/mt415893.aspx
https://msdn.microsoft.com/library/dn906738.aspx

Posted by Andrew at 21:33

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: 70-475, Azure, Certification, Designing and Implementing Big Data Analytics Solutions, exam, links, Microsoft, notes, reference, study

No comments:

Post a Comment

Newer Post Older Post Home

View mobile version

Subscribe to: Post Comments (Atom)

About Me

Andrew

View my complete profile

Blog Archive

► 2018 (1)
- ► January (1)

▼ 2017 (13)
- ► December (1)
- ► October (1)
- ► August (1)
- ► March (3)
- ▼ February (3)
- ► January (4)

► 2016 (14)
- ► December (3)
- ► November (2)
- ► September (3)
- ► August (4)
- ► June (1)
- ► April (1)

Awesome Inc. theme. Theme images by Airyelf. Powered by Blogger.