Cover

Review: I completed Data Engineering on Google Cloud Platform Specialization (Coursera)


Read Article

Update February 2022- Data Engineering, Big Data, and Machine Learning on GCP Specialization Review


Has this cource been updated? Sort of ... it is now called Data Engineering, Big Data, and Machine Learning on GCP Specialization
What is the price of this? Back in 2017 this was: $AUD425 ~= $US350. Now pricing has changed to a subscription model of $69 per month, where you get access to "all of coursera". If you just want to sit one course - you can do it "quickly" and then cancel your subscription to save money.
How different is this to previous:

New CourseOld CourseDifference
Google Cloud Big Data and Machine Learning FundamentalsGoogle Cloud Platform Big Data and Machine Learning FundamentalsSame
Modernizing Data Lakes and Data Warehouses with Google CloudLeveraging Unstructured Data with Cloud Dataproc on Google Cloud PlatformMinimal
Building Batch Data Pipelines on GCPServerless Data Analysis with Google BigQuery and Cloud DataflowMinimal
Smart Analytics, Machine Learning, and AI on GCPServerless Machine Learning with Tensorflow on Google Cloud PlatformMinimal
Building Resilient Streaming Analytics Systems on Google CloudBuilding Resilient Streaming Systems on Google Cloud PlatformMinimal

Summary February 2022:

The adivce is the same as before - the machine learning bits at the end are the best. However Google has advanced even further in terms of data pipelines and stremaing - so theres lots of good stuff here too!

------------------------------------------------

Review: I completed Data Engineering on Google Cloud Platform Specialization (Coursera)


Recently I completed the Data Engineering on Google Cloud Platform Specialization link here through Coursera, here is my review.

Summary Review:

It’s good, reasonably advanced, has plenty of code examples and I recommend it for anyone working on the GCP. Only problem was a couple of issues in the final labs of the course.

Detailed Review

The course is divided into 5 modules of increasing complexity:

  1. Google Cloud Platform Big Data and Machine Learning Fundamentals
  2. Leveraging Unstructured Data with Cloud Dataproc on Google Cloud Platform
  3. Serverless Data Analysis with Google BigQuery and Cloud Dataflow
  4. Serverless Machine Learning with Tensorflow on Google Cloud Platform
  5. Building Resilient Streaming Systems on Google Cloud Platform
  6. You can take each module out of order or complete sequentially. Its up to you, I’d recommend to keep it sequential at least roughly. I went from 1 to 3 then went back to 2, 4 and then 5.

The courses are hosted by Valliappa Lakshmanan from Google. He does a pretty great job overall. Modules are shaped initially with slides and discussion, followed by Labs run through Google Codelabs (https://codelabs.developers.google.com/) which is a free to use training platform for hands-on labs in the Google Cloud Platform – highly, highly recommended!

Each Module is slated to take between 6-8 hours to complete. I found this to be roughly correct, although closer to 8 hours, especially on the latter modules were the lab content really ramps up and you run into inevitable code issues (Google pub/sub version 0.27.0 Im looking at you!) which mean the labs take longer as you google for explanations…

Whilst I did complete 2 of the modules in one day (not recommended), they really are chock full of content, so Id make sure to leave adequate time. If you had time of work, Id say 5 in 5 days is doable, although still fairly chock full.

The labs were great with all of the code saved in Github for you to use. No complaints here, they all worked really well and fit into the rest of the course material nicely.

THERE ARE QUIZZES..

There are also a number of short Quizzes throughout (roughly 5 per module with 2 or 3 questions in each). They can be tricky and you get 3 attempts in any 8 hour period in case you don’t pass the first time. Funnily enough I didn’t pass the very first quiz… on second attempt I did and from then on I made sure not to repeat the first up miss for the rest of the course (which I didn’t :-))

THE MACHINE LEARNING BIT IS THE BEST!

My favourite part of the course was Serverless Machine Learning with Tensorflow on Google Cloud Platform. What a cool concept Tensorflow and its associated pals: Dataflow, Cloud ML and GCS are. It really seems like the entire google cloud has been set up to handle massive, at scale machine learning. The task is complicated – shifting out: data cleaning, feature transformations, hyperparameter training and data ingestion across myriads of machines on the GCP network.

Certainly the process has been greatly simplified with tools like Dataflow in particular, but the fact remains: Machine Learning (at scale/real-time etc) is large and complex undertaking. Users should consider taking at least a couple of weeks to formulate a proper data structure and model. Like many others have said, building the actual model is only a small percentage of the work – getting the data, cleaning it and (as the course shows) understanding it – will take up the bulk of your time.

The Tensorflow module focuses on building a ML Neural Network model that collates New York traffic data and attempts to estimate fares for taxi users given starting and ending locations. Variables such as time of day, day of week, euclidean distance etc are included in the model. What was most interesting to me – was how one of the biggest improvements in RMSE was at the end when the full data set was used – more data = more accurate model… interesting.

Minor Concerns with the Course:

Valliappa Lakshmanan contacted me on twitter re these issues – see below:


Thanks so much for responding Valliappa!! I will leave the original comments below here for a little while just in case someone else comes across a related issue and it might be helpful.

A couple of concerns I had with the course occurred in the final module on Building Resilient Streaming Systems on Google Cloud Platform. The final few labs used a simulated streaming model through Google Pub/Sub that didn’t work on my Cloud Shell. The reason being as stated above was the version conflicts in Pub/Sub. The code in the lab relies on version 0.27.0 of Pub/Sub to work. To get around this issue follow the steps here: https://github.com/GoogleCloudPlatform/training-data-analyst/tree/master/courses/streaming/publish

The other issue I had (which was unfortunately quite annoying) was that because I am on the free GCP trial I didn’t have enough “quota” to run the final lab. See below:

...



This is despite having over $300 credit still in my free trial as you can see above. From what I can tell there is an arbitrary quota limit that sits outside the $ figure stated in your trial. This is actually disappointing and I think Google should be more clear upfront to users about these quotas. Free data/resources is awesome! But facing the reality that your actual testing of the platform for some use cases can only occur on paid accounts is not. It is testing after all!

The courses are hosted by Valliappa Lakshmanan from Google. He does a pretty great job overall. Modules are shaped initially with slides and discussion, followed by Labs run through Google Codelabs (https://codelabs.developers.google.com/) which is a free to use training platform for hands-on labs in the Google Cloud Platform – highly, highly recommended!

OVERALL COMMENTS:

Despite the teething issues in the last module, overall I liked this course quite a bit and I’m really interested in getting Valliappa’s book now which is to be released on November 25th Link to Book Details

Thanks for Reading!

Tags:Fake ReviewsMarketingInternet

...

Scott Sunderland

Thanks for reading and I hope you have a wonderful day!