Google Cloud Run is a serverless compute platform that automatically scales applications in response to traffic. It is designed to run stateless containers, meaning that the instances of your application are ephemeral and can be spun up or down as needed. This design choice has implications for data storage, particularly when it comes to persistence.
The typical way to access Google compute instances from Cloud Run is usually done via the Serverless VPC Access. However, setting this up would mean that we are essentially create an instance that would be used as a proxy to send traffic from Cloud Run to the Google Compute instance.
Inspirations # While I was watching the following video of a talk by Richard Feldman: https://www.youtube.com/watch?v=zX-kazAtX0c&ab_channel=ChariotSolutions. He was covering a pretty interesting concept/topic of how would one “slowly” migrate codebases from one language to another. Let’s say the codebase for an application is pretty large - how would we safely move it over and change it without increasing the deployment targets? Let’s say we’re not in microservices land and it is difficult for us to do the whole deployment for a whole other server just to begin the migration of languages.
After coding in both Python and Golang, I now have a very strong preference for strongly typed languages. There is a certain charm and beauty in being able to have the IDE that I’m working in able to provide good autocomplete suggestions for the code - there is less for a need to keep moving files in the codebases just to ensure that the function spelling and params are correct etc. For smaller programs, dynamic types languages are still ok but they get very unwieldy once they go pass the hundreds of lines of code mark.
Python is a dynamically typed language - which provides a huge developer experience as compared to a statically typed language such as Golang. Python does serve as a nice introductory programming language for new developers but as time goes by, it’s pretty easy to see why static programming language is why nicer to work with as compared to dynamically typed language. Due to the nature of such languages, it is easy to be “loosey” about the types of the variables which inadvertably makes the code harder to follow as codebases grow larger and larger. With such large codebases - even type hints on IDE becomes harder to establish (either takes too long or the tooling just deems it impossible to do so)
This are some notes in the case where one wants to deploy a bunch of python “microservices” to a Google Kubernetes Engine cluster. These notes emphasize on the basics rather than the various nuances of running a “production” grade python application.
As one writes several python applications to be targeted on the Google Cloud Functions platform, it becomes increasingly obvious to pull out the more common bits of code out into its own library. Let’s have an example on the reason for this.
This is a continuation of previous blog post.
To summarize the previous related blog post.
Too painful to have people respond and react to report generation and compilation Too expensive to have machine lying around to pick up the slack and automate the reports; serverless solutions (pay on use) could be a useful model to use when running automated reports. Scenario presented for example purposes: 3 reports generated which are to be compiled to a single report. Previously mentioned 3 reports would be processed on the condition when the data files are dropped into the storage buckets. Event generated from it would automatically run the report Compilating reports # The next part of resolving our above mentioned situation (read previous blog post - part 1 for more details on this) is to compile the report. There are several ways to handle, each with their own advantages and drawbacks respectively. We would use the terms subreport to refer to reports for the initial set of reports that would then need to be compiled into a final report. These are just possible solutions; the combination of products that can be used to achieve the final goal of checking subreports and then compiling into the final report.
Seeing how functions change the way one looks at compute workloads in terms of products makes me wonder how one/companies can look at their analytics workloads and try to see if it was possible to change the costing model in that direction.
Data engineering work usually serves to be fundamentally one of the important bits when it comes to report generation in the business. The act of connecting of understanding the data that goes through the business and the need to maintain all the scripts that handle the pulling and merging all of such data makes the job way harder than one can expect. You are not expected to just be a script junkie; you are expected to be an expert at your domain, understanding the different nuances and assumption each line of script imposes on the processing of such data.
Meetup.com is a pretty nice site to setup meetups and sharings on technologies. The platform is pretty nice and easy to use when it comes to bookings but sometimes, the data provided by its web interface is not sufficient nor does it fit our use case. In this case, let’s say you are trying to understand the trend of the number of people attending a meetup. To an organizer, an important thing to him/her is to understand what kind of actions would lead to higher turnups/registrations for a meetup. So, by the end of this post, hopefully we would be able to have a pretty decently priced (free if possible) solution for an analytics solution which would only be called occasionally.
Following from the previous blog post: Using AWS Lambda for Data Science Projects and Automations - Part 1
Let’s deploy a serverless application!
Problem Statement:
The application we would be trying out this time will do the following:
A thought experiment # Let’s say there was this one day during your usual work hours where you are tasked to handle some data transformations between your data sources. The data source is csv file generated from backend systems and is provided on the hourly basis. These data sources are to be analyzed as soon as possible and the insights are to be relayed to the marketing and business intelligence teams. How should we handle this? (Of course we should aim for as cheap a solution as possible)
I’ve been learning plenty of Golang nowadays and one of the most common design patterns that I keep hearing about is the decorator pattern. It is often used when handling with web requests; where you would create a function that accepts a struct that implements the handler interface which would then return an struct that also implements the handler interface.
This is a suggestion piece and not a recommended way of using docker or anything.
Motivation # The question we would want to know here is how do we exactly run the full on/all the unit tests for our applications built via Docker. One way to do this is to rely on a build server like Jenkins to create the required environment which we would need for a build and then run the unit test needed. However, this would mean that there is need to bootstrap a environment to do so.
Over the weekend, I’ve been experimenting whether if its possible to set up screen recording on a linux server. This is partly just out of curiosity but also, a little a bit of frustration. Imagine if you were in a position where you aim to assist people in recording their training sessions over on Google Hangouts but in order to do so, you would need to be around and your computer needs to be “sacrificed” in order to do the recording.